Neo4j

myshareit · ‎07-18-2022

I have the following Cypher query:

MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Decision:Profile ) 
MATCH (childD)-[:EMPLOYMENT_AS]->(root2:Employment ) 
WHERE root2.id IN ([1]) WITH DISTINCT childD, dg, rdgd  MATCH path3=(root3:Location )-[:CONTAINS*0..]->(descendant3:Location) 
WHERE (descendant3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]) 
OR root3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35])) 
UNWIND nodes(path3) AS pathNode3 
WITH childD, dg, rdgd, COLLECT(DISTINCT pathNode3) AS pathNodes3 
MATCH (childD)-[:LOCATED_IN]->(pathNode3) 
WHERE pathNode3 IN pathNodes3 
RETURN DISTINCT childD

which runs slowly:

Could you please help to optimize this query? I need to check that childD variable :LOCATED_IN Locations (with provided IDs) or their descendants

dana_canzano · ‎07-18-2022

@myshareit

Is there an index on :Location and property id?

glilienfield · ‎07-18-2022

In addition to the index improvements recommended by @dana_canzano, I think you can also optimize the query. The way I read it, you are performing the same 'match' query to get the 'path3' for each 'childD, dg, rdgd' combination. This is because a Cartesian product is computed when you have the 'with' followed by the 'match'. I removed that by collecting all the 'childD' nodes and passing this with the 'with.' Since the collection result in only one row, only one 'match' is executed. The result is a row for each 'path3', with the same 'children' list appended to each row. I also removed the passing of 'rd' and 'rdgd', since they were not utilized in the rest of the query.

What it seems you are returning is the list of all childD nodes that have an existing relationships with any of the nodes along any of the path3 rows that are returned. The refactored query should give you that.

Sorry, I did have any test data to validate it gives the correct results.

MATCH (dg:DecisionGroup {id: -2})-[rdgd:CONTAINS]->(childD:Decision:Profile ) 
MATCH (childD)-[:EMPLOYMENT_AS]->(root2:Employment ) 
WHERE root2.id IN ([1]) 
WITH collect(DISTINCT childD) as children
MATCH path3=(root3:Location )-[:CONTAINS*0..]->(descendant3:Location) 
WHERE (descendant3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]) 
OR root3.id IN ([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35])) 
UNWIND nodes(path3) AS pathNode3
WITH distinct pathNode3, children
UNWIND children as child
MATCH (child)-[:LOCATED_IN]->(pathNode3) 
RETURN distinct child

Neo4j

Neo4j Cypher query execution optimization