Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-03-2022 09:15 AM - edited 10-03-2022 09:17 AM
Hi ! I'm currently working on a project using Python and Neo4J 3.5. I'm dealing with voluminous data, and my goal to find links between "important" nodes. I wrote the following query in order to find the shortest path between every pair of "important" nodes (considering only path of length1 or 2) :
MATCH path = shortestPath( (n1)-[*..2]-(n2) )
WHERE n1:IMPORTANT and n2:IMPORTANT and id(n1)>id(n2)
RETURN path
To get the additional links between intermediates nodes, this result is completed with a second query :
MATCH ()-[r]-()
RETURN r
The result of the second query was filtered (via python) to only keep the relationships between the nodes obtained through the first query.
I'm trying to improve the code, so that the result can be obtained through one query. I write the following query :
MATCH path = shortestPath( (s1)-[*..2]-(s2) )
WHERE s1:IMPORTANT and s2:IMPORTANT and id(s1)>id(s2)
WITH nodes(path) as nodeslist
MATCH p = (m)-[r]-(n)
WHERE m in nodeslist AND n in nodeslist AND id(m)>id(n)
RETURN p
However this doesn't seem to work : the links between intermediate (non important) nodes are not returned.
Here is a little set up to reproduce the error:
MERGE (n1:IMPORTANT {name:'Emma'})
MERGE (n2:IMPORTANT {name:'David'})
MERGE (n3:IMPORTANT {name:'Peter'})
MERGE (n4:NEUTRAL {name:'Paul'})
MERGE (n5:IMPORTANT {name:'Mary'})
MERGE (n6:NEUTRAL {name:'Jane'})
MERGE (n7:NEUTRAL {name:'John'})
MERGE (n1) - [r1:KNOWS] - (n2)
MERGE (n2) - [r2:KNOWS] - (n4)
MERGE (n2) - [r3:KNOWS] - (n6)
MERGE (n4) - [r4:KNOWS] - (n3)
MERGE (n4) - [r5:KNOWS] - (n6)
MERGE (n5) - [r6:KNOWS] - (n6)
MERGE (n7) - [r7:KNOWS] - (n1)
The full graph (edges are not oriented)
Expected result:
Python code:
driver = GraphDatabase.driver(uri, auth=(username, password))
session= driver.session()
query = """MATCH path = shortestPath( (s1)-[*..2]-(s2) )
WHERE s1:IMPORTANT and s2:IMPORTANT and id(s1)>id(s2)
WITH nodes(path) as nodeslist
MATCH p = (m)-[r]-(n)
WHERE m in nodeslist AND n in nodeslist AND id(m)>id(n)
RETURN p"""
graph = session.run(query).graph()
print([n["name"] for n in graph.nodes])
print(["-".join([n["name"] for n in r.nodes]) for r in graph.relationships])
Result : 5 nodes, and 5 relationships (instead of 6 relationships). The edge between Paul and Jane is lacking.
['David', 'Emma', 'Paul', 'Peter', 'Jane', 'Mary'] ['Emma-David', 'Paul-Peter', 'David-Paul', 'David-Jane', 'Mary-Jane']
Is my query misleading ? Neo4j version is 3.5, neo4j python lib is 4.4.4.
Solved! Go to Solution.
10-04-2022 01:33 AM
I solves my issue with this query :
MATCH path = shortestPath( (s1)-[*..2]-(s2) )
WHERE s1:IMPORTANT and s2:IMPORTANT and id(s1)>id(s2)
WITH nodes(path) as nodeslist
UNWIND nodeslist as n
WITH COLLECT(DISTINCT n) as flatnodeslist
MATCH p = (m)-[r]-(n)
WHERE id(m)>id(n) AND n in flatnodeslist AND m in flatnodeslist
RETURN p
nodes(path) returns a list of lists, containing the nodes in all the different shortest paths. Therefore there was no sub list where Jane and Paul were both in. To get the actual list of nodes, I had to flatten the list using the collect distinct syntax.
10-03-2022 10:28 AM
You can get the relationships along a path with relationships(path). The relations returned have the ids of the start and end nodes, the relationship properties, and its type
10-04-2022 12:42 AM
Thank you for four answer; but I don't see how it can solve my issue ?
The following query gives the exact same result : the relationship between Jane and Paul is still missing.
MATCH path = shortestPath( (s1)-[*..2]-(s2) )
WHERE s1:IMPORTANT and s2:IMPORTANT and id(s1)>id(s2)
WITH nodes(path) as nodeslist
MATCH p = (m)-[r]-(n)
WHERE m in nodeslist AND n in nodeslist AND id(m)>id(n)
RETURN relationships(p), nodes(p)
10-04-2022 01:33 AM
I solves my issue with this query :
MATCH path = shortestPath( (s1)-[*..2]-(s2) )
WHERE s1:IMPORTANT and s2:IMPORTANT and id(s1)>id(s2)
WITH nodes(path) as nodeslist
UNWIND nodeslist as n
WITH COLLECT(DISTINCT n) as flatnodeslist
MATCH p = (m)-[r]-(n)
WHERE id(m)>id(n) AND n in flatnodeslist AND m in flatnodeslist
RETURN p
nodes(path) returns a list of lists, containing the nodes in all the different shortest paths. Therefore there was no sub list where Jane and Paul were both in. To get the actual list of nodes, I had to flatten the list using the collect distinct syntax.
10-04-2022 12:09 PM
I believe your solution has an error in it. If you execute the following query, you get the path results shown below, which lists the nodes along each found path.
MATCH path = shortestPath( (s1)-[*..2]-(s2) )
WHERE s1:IMPORTANT and s2:IMPORTANT and id(s1)>id(s2)
return nodes(path)
As shown, there are five relationships within these three paths. The result of your query shows six relationships, as shown below:
The extra relationship is the last one between 'Jane' and 'Paul'. It must be coming from line six in your query. That match is looking for relationships between any two nodes that are members of the paths. It overlooks the fact that these nodes can have relationships that are not part of the path results.
You can get the results you want from the following query. Note, the 'distinct' is not needed for the test data, but may be needed for a more generalized data set.
match path = shortestPath( (s1)-[*..2]-(s2) )
where s1:IMPORTANT and s2:IMPORTANT and id(s1)>id(s2)
unwind relationships(path) as rel
with distinct rel
return startNode(rel), endNode(rel), labels(startNode(rel)), labels(endNode(rel))
10-05-2022 02:22 AM
But as I mentioned in my first post the point here was to get the shortest paths between important persons AND additional relationships between the nodes returned in the shortest path ...
So in this case in want the paths Emma->David; David->Paul ->Peter; David -> Jane->Mary; and since Paul and Jane are part of the returned nodes and are related, I want to show the link between Paul and Jane.
As I stated in my first post, my goal was to add the relationship between Jane and Paul. The first query was sufficient to get every node in the shortest path, but my goal was to add additional existing relationship between the returned nodes.
All the sessions of the conference are now available online