Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-21-2020 01:22 AM
Hi Team,
I have a simple graph that has Nodes which represent duplicate record id in the below form
Duplicate Id, Original Id
A,B
B,C
C,D
X,Y
Y,Z
The directed graph looks like A -> B ->C ->D and I want CSV result that looks like below that will represent the ultimate Original Id
A,D
B,D
C,D
X,Z
Y,Z
Have been struggling for more than a day being newbie to graph databases, any help will be highly appreciated.
Thanks
09-28-2020 11:30 AM
Here ya go:
Cypher:
MATCH p=(a:Ident)-[:ORIGID*]->(b:Ident)
WHERE NOT (b)-[:ORIGID]->(:Ident)
WITH a, LAST(nodes(p)) as b
RETURN a.name, b.name
ORDER BY a.name
09-20-2021 05:40 AM
Thank you for the reply and it worked well for small clusters however I have 1 large cluster of 50 connected nodes and it is struggling to make all traversals and failing. Is there any way to prevent further traversals from Origin node if the 1 leaf node is found? All nodes in my cluster are connected to each other so at all times there is only 1 leaf node in a cluster so I can use that fact to have a query that works without failing.
09-24-2021 04:53 PM
So it's a directed graph, but not acyclic.
You could use subgraphNodes() from APOC path expanders, but you would need to pre-match to leaf nodes first, collect() them and use them as end nodes, and use a limit:1
so that it stops looking after a single path to a leaf node is identified.
Alternately, if using Neo4j 4.x, you can use subqueries to limit the expansion to only one leaf node:
MATCH (a:Ident)
CALL {
WITH a
MATCH (a)-[:ORIGID*]->(b:Ident)
WHERE NOT (b)-[:ORIGID]->()
RETURN b
LIMIT 1
}
RETURN a.name as aName, b.name as bName
ORDER BY aName
That will find only one leaf node per a node. It won't look for others.
The LIMIT can't be dynamic from a variable, but if you executed a query prior to this to find the number of leaf nodes ( MATCH (a:Ident) WHERE NOT (a)-[:ORGID]->() RETURN count(a)
), and you passed that value as a parameter to your query, then you could use RETURN DISTINCT b LIMIT $leafNodeCount
so it would stop searching when all leaf nodes have been found.
All the sessions of the conference are now available online