Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-30-2020 02:26 PM
TL;DR:
Slow Cypher Query For Large Dataset.. Why?
QUERY:
EXPLAIN
WITH "(?i).test.*" as target
MATCH (AA:A)-[r:B]-(CC:C)
WHERE ANY(host IN AA.D WHERE host =~ target)
OPTIONAL MATCH (DD:D) WHERE CC.id =~ DD.CCid
OPTIONAL MATCH (EE:E) WHERE CC.id =~ EE.CCid
OPTIONAL MATCH (FF:F) WHERE CC.id =~ FF.CCid
OPTIONAL MATCH (GG:G) WHERE CC.id =~ GG.CCid
OPTIONAL MATCH (HH:H) WHERE CC.id =~ HH.CCid
WITH COLLECT(AA) + COLLECT(CC) + COLLECT(DD)
+ COLLECT(EE) + COLLECT(FF) + COLLECT(GG)
+ COLLECT(HH) as data
UNWIND data as datum
RETURN DISTINCT datum;
CONTEXT:
Without Neo4J Enterprise (procurement delays), I've been trying to performance tune some queries by hand using
EXPLAIN
andPROFILE
, with limited success.I'd like to leverage the wisdom of crowds at this point, if feasible..
Any insights to be shared about the performance heuristics (or obvious nonperformant clauses) of the query above would be immensely appreciated.
Thanks,
Neo4j User
10-30-2020 04:12 PM
First, none of these will leverage indexes for lookup, since the =~ regex match operator doesn't use the indexes. If you know the case of the property, then you can use the STARTS WITH
or CONTAINS
keywords, which will leverage indexes if they are present.
Next, you have a multiplicative cardinality issue from all those back to back optional matches. We recommend you collect as soon as possible after you've matched the thing, that way cardinality will reset (since the grouping key becomes distinct). The way you have it now, the row cardinality will multiply out with each successive OPTIONAL MATCH, which means your query is doing more and more unnecessary work.
so the pattern will be something like:
...
OPTIONAL MATCH ... WHERE ...
WITH <variables already in scope>, collect(<whatever you're matching to this time>) as <variablename>
OPTIONAL MATCH ... WHERE ...
etc.
...
All the sessions of the conference are now available online