Neo4j

pjljvandelaar · ‎11-16-2022

Dear Neo4j developers,

I have two cypher queries that I think should be equivelant:

MATCH (f:CppFunctionDeclaration)
WHERE exists (()-[:CppCalls]->(f))
and exists (()-[:CppCalls*2..5]->(f))
and exists (()-[:CppCallsOverride]->(f))
RETURN f.class, f.function order by f.class, f.function

MATCH directCalls = ()-[:CppCalls]->(f:CppFunctionDeclaration),
indirectCalls = ()-[:CppCalls*2..5]->(f),
directOverrides = ()-[:CppCallsOverride]->(f)
WITH f,
count (directCalls) as nrofDirectCalls,
count (indirectCalls) as nrofIndirectCalls,
count (directOverrides) as nrofDirectOverrides
WHERE 0 < nrofDirectCalls
and 0 < nrofIndirectCalls
and 0 < nrofDirectOverrides
RETURN f.class, f.function order by f.class, f.function

However, when I run them on my database, I get

Started streaming 19 records after 15 ms and completed after 145 ms.

Started streaming 13 records after 13 ms and completed after 3783 ms.

respectively.

Can someone please explain why exists (<paths>) is different from 0 < count (<paths>)?

Thanks in advance,
Pierre

glilienfield · ‎11-16-2022

I think you should get the same set of ‘f’ records, but you will have many duplicates in the second approach.

the two queries are doing different stuff. The first one is finding all CcpFunctionDeclaration nodes and then filtering only those that have at least one path matching each pattern specified in your exists clauses. Note that each exists just needs to find one path then stop looking.

The second query is finding all directCall paths for each CcpFunctionDeclaration node. This can be more than one for each CcpFunctionDeclaration node. For each of those paths, the second match is executed, finding all indirectCalls paths. The resulting number of rows is now the product of the two query results. For each of those rows, the third match pattern is executed, resulting in and even larger set of rows. Again the cross product of the results. After counting and filtering, you end up with a lot more rows of data compared to the first query, which I suspect has duplicate data for the ‘f’ nodes due to the 3 way Cartesian product.

I also believe the longer time is a result of having to find more data and processing it.

As I see it, always use an ‘exists’ clause if you just need to know a path exists.

View solution in original post

glilienfield · ‎11-16-2022

I think you should get the same set of ‘f’ records, but you will have many duplicates in the second approach.

the two queries are doing different stuff. The first one is finding all CcpFunctionDeclaration nodes and then filtering only those that have at least one path matching each pattern specified in your exists clauses. Note that each exists just needs to find one path then stop looking.

The second query is finding all directCall paths for each CcpFunctionDeclaration node. This can be more than one for each CcpFunctionDeclaration node. For each of those paths, the second match is executed, finding all indirectCalls paths. The resulting number of rows is now the product of the two query results. For each of those rows, the third match pattern is executed, resulting in and even larger set of rows. Again the cross product of the results. After counting and filtering, you end up with a lot more rows of data compared to the first query, which I suspect has duplicate data for the ‘f’ nodes due to the 3 way Cartesian product.

I also believe the longer time is a result of having to find more data and processing it.

As I see it, always use an ‘exists’ clause if you just need to know a path exists.

pjljvandelaar · ‎11-20-2022

The goal was to get the counts.

Your reply learned me that another solution was needed

glilienfield · ‎11-21-2022

You can try the following if the count of each path is what you require:

MATCH (f:CppFunctionDeclaration)
RETURN f.class, f.function,
size([(o)-[:CppCalls]->(f)|o]) as c1, 
size([(o)-[:CppCalls*2..5]->(f)|o]) as c2,
size([(o)-[:CppCallsOverride]->(f)|o]) as c3
order by f.class, f.function

This syntax also works, but passing a pattern to size is deprecated. The above syntax is what is recommended. I recall reading something new in this regards in release 5.x.

MATCH (f:CppFunctionDeclaration)
RETURN f.class, f.function,
size(()-[:CppCalls]->(f)) as c1, 
size(()-[:CppCalls*2..5]->(f)) as c2,
size(()-[:CppCallsOverride]->(f)) as c3
order by f.class, f.function

Neo4j

Why do queries return different results?