Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-26-2020 07:21 AM
What is the right way to execute and get results from a query that takes hours to process? Or is just Neo4j not built to do such things? Or is there some hidden timeout limit that silently kills such queries? Or do I need the enterprise edition?
Obviously running the query in browser and waiting for the result is out of the question.
I had some success with wrapping the query into apoc.export.cypher.query() and executing it locally through cypher-shell running in background using '&' on Linux as suggested here Cypher-Shell - how to run testquery.cypher in background. Works for queries taking e.g. 20 minutes but more demanding queries seems to just die silently without leaving any trace in the logs.
11-27-2020 04:24 AM
There are many things that may cause a query to take a long time to run. Usually, some query optimisation and/or model refactoring can make huge impacts on this.
Are you able to share your queries/talk about what optimisations you have done to them?
Also, how much memory have you configured for the queries to run?
Cheers,
Lju
11-27-2020 09:22 AM
see the very good video:
Not knowing specifics of your problem, here are some random ideas (stabs in the dark and I don't mean to insult your intelligence if you already know about these...):
toInteger()
and toFloat()
and store them in the DB before doing the query)MATCH(a:A),(b:B) WHERE ...
use MATCH(a:A) MATCH(b:B) WHERE...
MATCH(a)-[r]-> ...
is more expensive than MATCH(a:A)-[r:R]->...
length(a.property)=0
is more expensive than a.property IS NULL
but I don't know how much more expensive it is.I hope this helps. If you could show us your query (along with some statistics: call apoc.meta.stats
and :schema
) perhaps we could better be able to help you.
12-07-2022 09:16 AM
Null check in neo4j is more expensive than 'not null' check. I am performing both on an indexed field and null check takes forever.
11-27-2020 10:42 AM
Thank you both for valuable tips. Now lets say, hypothetically, that the query is as optimized as it can be and I am perfectly ok with it running for 2 hours before I get the result because it is some complex computation or extraction of a large subgraph that simply takes a lot of time.
Now is there some obstacle, like some default dbms.transaction.timeout that will stop me from getting the result from Neo4j?
12-04-2020 04:23 AM
In this hypothetical situation, provided that you've managed your query in such a way that the memory load is controlled, yes, you could set configuration in such a way to ensure the query runs for as long as it needs to
11-27-2020 03:27 PM
Another link on query optimizations:
I wasn't clearly your question on time out...
But if you set the timeout to 0, it won't timeout:
11-28-2020 06:55 PM
I found another interesting video. What is very interesting is at minute 15:30.
If your query has TWO things that you are filtering and that has an index, then it's better to force Neo4J to use both indexes.
I'm surprised by this!. (I tried this with version 4.2)
Their example using the Movie DB. Instead of this:
PROFILE MATCH p = (p1:Person)-[:ACTED_IN*6]-(p5:Person)
WHERE p1.name='Tom Cruise' and p5.name='Kevin Bacon'
RETURN [n in nodes(p) | coalesce(n.title, n.name)]
do this:
PROFILE MATCH p = (p1:Person)-[:ACTED_IN*6]-(p5:Person)
USING INDEX p1:Person(name)
USING INDEX p5:Person(name)
WHERE p1.name='Tom Cruise' and p5.name='Kevin Bacon'
RETURN [n in nodes(p) | coalesce(n.title, n.name)]
12-07-2022 09:14 AM
If you don't need some fields in return don't return them it might greatly increase the time needed to return the values. Instead filter properties like this:
WITH keys(e) AS k1, e
UNWIND k1 AS k2
WITH e, k2 where k2 <> "relationship_string" and k2 <> "relationshipString" and RETURN id(e) AS entityId, k2 AS Prop,e[k2] AS Value
All the sessions of the conference are now available online