Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-31-2019 03:03 AM
Hello community!
We are doing some cypher query optimization operations. And so far everything is good except one thing. Below you see our cypher query:
MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:A2)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return distinct node_traces
If we remove the "distinct" part of the return statement...
MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:A2)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return node_traces
We expected to receive the same output. But there are some paths which are matched twice. We have large datasets and we don't wanna use "distinct".
On the other hand for smaller TIMs (<4) We got the same output. For example: Here we got the same number of paths as result.
MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return node_traces
MATCH path=(s:Source)-[:Link]->(:A1)-[:Link]->(:Sink)
WITH [s in nodes(path)|id(s)] AS node_traces
Return distinct node_traces
Can anyone explain that phenomenon?
01-31-2019 03:32 AM
There might be different LINK relationships between two elements. That would produce different paths.
As the uniqueness is on the relationships not nodes.
How much does the distinct really affect your query time?
Did you try:
WITH distinct nodes(path) as nodes
RETURN [s in nodes | id(s)] AS node_traces
I don't think there is a path-uniqueness operation right now built in. As it still requires past paths to be kept in a datastructure to compare with.
Are you using enterprise with slotted runtime?
All the sessions of the conference are now available online