cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Find random path of length n between two nodes and remove cycles

I have a large well-connected graph with about 50 million nodes and 350 million relationships. All relationships are directional. I would like to efficiently find a path of length n between any two nodes in the graph and remove cycles from the path.

I have figured out how to find a path of length n, though it isn't very fast:

MATCH path=(start:Node {name:'nodeA'})-[:CONNECTS_TO*19..20]->(end:Node {name:'nodeB'})
RETURN path LIMIT 1

But I would like to find a way to remove cycles and flatten out the subgraph so that each node only points to one other node. I saw that there is a apoc.nodes.cycles method that can detect cycles, but I don't know how to pass the results of the above query in and remove cycles after they're detected.

1 REPLY 1

@ackum

If I understood well, you want to delete ALL cycles found in the path matched.
In this case, yes you could use the apoc.nodes.cycles procedure,
for example executing :

MATCH path=(start:Node {name:'nodeA'})-[:CONNECTS_TO*19..20]->(end:Node {name:'nodeB'})
WITH path LIMIT 1 // limit path number
WITH nodes(path) as pathNodes // we retrieve all nodes in the path
CALL apoc.nodes.cycles(pathNodes)  // for each path we found cycles
YIELD path 
WITH nodes(path) as nodes  // we retrieve all nodes in the cycle-path
UNWIND nodes[1..size(nodes) - 1] as node // unwind the nodes EXCEPT the start cycle node
DETACH DELETE node // remove all other nodes found

Anyway, I don't know why you used LIMIT 1, perhaps not to make the query too heavy?
In this case you could remove the WITH path LIMIT 1 and use the apoc.periodic.iterate to batch the result:

CALL apoc.periodic.iterate("MATCH path=(start:Node {name:'nodeA'})-[:CONNECTS_TO*19..20]->(end:Node {name:'nodeC'}) RETURN path",
"WITH nodes(path) as pathNodes // we retrieve all nodes in the path
CALL apoc.nodes.cycles(pathNodes)  // for each path we found cycles
YIELD path 
WITH nodes(path) as nodes  // we retrieve all nodes in the cycle-path
UNWIND nodes[1..size(nodes) - 1] as node // unwind the nodes EXCEPT the start cycle node
DETACH DELETE node", {}) // remove all other nodes found

However, first of all, I recommend you to test the query with a small test dataset / backup it, because this one could be very dangerous.

Another thing, to search path you could also use the apoc.path.expandConfig,
to customize the search, for example:

PROFILE MATCH (end:Node {name:'nodeB'})
WITH collect(end) as terminatorNodes // collect all terminatorNodes
MATCH (start:Node {name:'nodeA'})
WITH start, terminatorNodes
CALL apoc.path.expandConfig(start, {
    relationshipFilter: "CONNECTS_TO",
    minLevel: 19,
    maxLevel: 20,
    terminatorNodes: terminatorNodes
})
YIELD path
RETURN path;