Yen's Shortest Path throws error when applied to projection without parallel edges
‎02-02-2023 05:50 AM
I'm trying to get the top x unique shortest node paths. If I use a simple projection:
CALL gds.graph.project("graphSingleType", "*", {TYPE: {type: "*"}})
Yen's will return the same node path multiple times having taken different parallel edges between the nodes.
So I make a graph projection without parallel edges:
CALL gds.graph.project("graphSingleType_NP", "*", {TYPE:{type:'*', aggregation:"SINGLE"}})
The problem is that Yen's then only works when k=1 (ie when Dijkstra's is used). For all k>1, The following error is thrown:
Failed to invoke procedure `gds.shortestPath.yens.stream`: Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
I found the issue in GDS version 2.1 so I updated to 2.3 but the problem is still there.
Am I not understanding something or is this an actual bug? Alternatively, is there another way I can get the top x unique node paths?
To reproduce:
CREATE (a:CITY),
(b:CITY),
(c:CITY),
(d:CITY),
(e:CITY),
(f:CITY),
(a)-[:ROAD]->(b),
(a)-[:ROAD]->(b),
(b)-[:ROAD]->(c),
(b)-[:ROAD]->(d),
(c)-[:ROAD]->(f),
(d)-[:ROAD]->(e),
(e)-[:ROAD]->(c),
(e)-[:ROAD]->(f),
(a)-[:PATH]->(b),
(d)-[:PATH]->(e),
(d)-[:PATH]->(e)
MATCH (source), (target) WHERE id(source)=0 AND id(target)=5
CALL gds.shortestPath.yens.stream("graphSingleType_NP",
{sourceNode:source,
targetNode:target,
k:3})
‎02-03-2023 10:04 AM
I found a work around: use a cypher projection rather than a native projection.
From the example above, I get the following graph:
I then create my graph projection, "graphSingleNodeType_NP", a single graph without parallel edges.
CALL gds.graph.project.cypher(
'graphSingleType_NP',
'MATCH (n) RETURN id(n) AS id',
'MATCH (n)-[r]->(m) RETURN DISTINCT id(n) AS source, id(m) AS target'
)
The projection removes all parallel edges, in the same way the native projection did.
I then run Yen's algorithm.
MATCH (source), (target) WHERE id(source)=0 AND id(target)=5
CALL gds.shortestPath.yens.stream("graphSingleType_NP",
{sourceNode:source,
targetNode:target,
k:10}) YIELD nodeIds RETURN nodeIds
And get the desired results: in this case the top 10 unique node paths. However, since no more than 3 unique node paths exist between nodes 0 and 5, only 3 paths are returned.
It's probably worth mentioning that I found Native projections to be notably faster than Cypher Projections when used on larger Neo4j databases.
Also, I found this reference to be very helpful in understanding GDS graph projections https://tbgraph.wordpress.com/2020/03/30/analyzing-multigraphs-in-neo4j-graph-data-science-library for
‎02-03-2023 11:30 AM
Thank you for the nice write up and reproducible example. This looks like a bug to me, one that is related to the parallel aggregation. I will go ahead and report it.
For future reference, we consider most java.lang.* exceptions to be bugs. Feel free to report them here: https://github.com/neo4j/graph-data-science/issues .
You are correct that native projections are significantly faster for large graphs than Cypher projections.
‎02-06-2023 02:26 AM
Hi @Jammer ,
Thank you for bringing this issue to our attention. We have identified the issue, we are working on a fix. We will let you know when a patch with the fix is released.
Best regards,
Ioannis.
‎02-06-2023 08:21 AM
What I have actually ended up doing:
So for my real database, a cypher projection is too slow (I gave up after 40 minutes). However, GDS 2.3 has added a handy function "toUndirected". Since I want the option of finding the shortest directed or undirected paths, this function handles half the battle for me. For now, I'll just use Dijkstra's for the directed option.
#1) Make a single graph projection with one edge type (one rel type, no parallel edges)
CALL gds.graph.project("graphSingleType_NP", "*", {TYPE:{type:'*', aggregation:"SINGLE"}})
#2) Add and "undirected" edge type to the projection
CALL gds.beta.graph.relationships.toUndirected(
'graphSingleType_NP',
{relationshipType: 'TYPE', mutateRelationshipType: 'TYPE_UD'}
)
YIELD inputRelationships, relationshipsWritten
RETURN inputRelationships, relationshipsWritten
#3) Calculating Yen's on the undirected edge type works
MATCH (source), (target) WHERE id(source)=31 AND id(target)=26
CALL gds.shortestPath.yens.stream("graphSingleType_NP",
{sourceNode:source,
targetNode:target,
k:3,
relationshipTypes:['TYPE_UD']
}) YIELD nodeIds RETURN nodeIds
#4) Calculating Yen's on the original type still fails:
MATCH (source), (target) WHERE id(source)=31 AND id(target)=26
CALL gds.shortestPath.yens.stream("graphSingleType_NP",
{sourceNode:source,
targetNode:target,
k:3,
relationshipTypes:['TYPE']
}) YIELD nodeIds RETURN nodeIds
And thanks for the responses. Nice knowing this will probably be fixed in the future. Cheers!