cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Best practices for queries that can take hours to complete

What is the right way to execute and get results from a query that takes hours to process? Or is just Neo4j not built to do such things? Or is there some hidden timeout limit that silently kills such queries? Or do I need the enterprise edition?

Obviously running the query in browser and waiting for the result is out of the question.

I had some success with wrapping the query into apoc.export.cypher.query() and executing it locally through cypher-shell running in background using '&' on Linux as suggested here Cypher-Shell - how to run testquery.cypher in background. Works for queries taking e.g. 20 minutes but more demanding queries seems to just die silently without leaving any trace in the logs.

8 REPLIES 8

Hi @marek.kasle.na

There are many things that may cause a query to take a long time to run. Usually, some query optimisation and/or model refactoring can make huge impacts on this.

Are you able to share your queries/talk about what optimisations you have done to them?

Also, how much memory have you configured for the queries to run?

Cheers,

Lju

clem
Graph Steward

see the very good video:


which also points to:

Not knowing specifics of your problem, here are some random ideas (stabs in the dark and I don't mean to insult your intelligence if you already know about these...):

  • Make sure you have indexes created for things that you are filtering on. Also potentially useful is composite indexes: https://maxdemarzi.com/2020/02/19/composite-indexes-in-neo4j-4-0/#more-5696
  • Use integers or floats where possible (e.g. toInteger() and toFloat() and store them in the DB before doing the query)
  • Avoid cartesian products: Instead of MATCH(a:A),(b:B) WHERE ... use MATCH(a:A) MATCH(b:B) WHERE...
  • Use Labels in your queries. MATCH(a)-[r]-> ... is more expensive than MATCH(a:A)-[r:R]->...
  • Try parallelism: https://neo4j.com/labs/apoc/4.2/cypher-execution/parallel/ (I haven't tried it myself, so I'm not sure how it works exactly.)
  • figure out how to winnow down the possible matches as much as possible and as soon as possible in the query.
  • I suspect this is true: take advantage of NULL values instead of setting things to empty strings or 0. I believe that property looks will go faster if Neo4J sees that there is no property, instead of looking up the property and then getting its value to do the comparison. E.g. length(a.property)=0 is more expensive than a.property IS NULL but I don't know how much more expensive it is.

I hope this helps. If you could show us your query (along with some statistics: call apoc.meta.stats and :schema) perhaps we could better be able to help you.

Null check in neo4j is more expensive than 'not null' check. I am performing both on an indexed field and null check takes forever.

Thank you both for valuable tips. Now lets say, hypothetically, that the query is as optimized as it can be and I am perfectly ok with it running for 2 hours before I get the result because it is some complex computation or extraction of a large subgraph that simply takes a lot of time.
Now is there some obstacle, like some default dbms.transaction.timeout that will stop me from getting the result from Neo4j?

In this hypothetical situation, provided that you've managed your query in such a way that the memory load is controlled, yes, you could set configuration in such a way to ensure the query runs for as long as it needs to

clem
Graph Steward

Another link on query optimizations:

I wasn't clearly your question on time out...

But if you set the timeout to 0, it won't timeout:

clem
Graph Steward

I found another interesting video. What is very interesting is at minute 15:30.

If your query has TWO things that you are filtering and that has an index, then it's better to force Neo4J to use both indexes.

I'm surprised by this!. (I tried this with version 4.2)

Their example using the Movie DB. Instead of this:

PROFILE MATCH p = (p1:Person)-[:ACTED_IN*6]-(p5:Person)
WHERE p1.name='Tom Cruise' and p5.name='Kevin Bacon' 
RETURN [n in nodes(p) | coalesce(n.title, n.name)]

do this:

PROFILE MATCH p = (p1:Person)-[:ACTED_IN*6]-(p5:Person) 
USING INDEX p1:Person(name)
USING INDEX p5:Person(name)
WHERE p1.name='Tom Cruise' and p5.name='Kevin Bacon' 
RETURN [n in nodes(p) | coalesce(n.title, n.name)]

If you don't need some fields in return don't return them it might greatly increase the time needed to return the values. Instead filter properties like this:

WITH keys(e) AS k1, e
UNWIND k1 AS k2
WITH e, k2 where k2 <> "relationship_string" and k2 <> "relationshipString" and RETURN id(e) AS entityId, k2 AS Prop,e[k2] AS Value