Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-19-2020 10:59 AM
Hello
I'm working with a graph database that represents public transport and I'm trying to calculate certain shortest paths through it. The following query takes 14 seconds to complete and I'd love any help on speeding it up:
MATCH (schulort:Haltestelle { BPUIC: 8502202 })
CALL gds.alpha.shortestPath.deltaStepping.stream({
nodeProjection: 'Haltestelle',
relationshipProjection: {
FAHRT_NACH: {
type: 'FAHRT_NACH',
properties: 'Reisedauer'
}
},
startNode: schulort,
relationshipWeightProperty: 'Reisedauer',
delta: 3.0
})
YIELD nodeId AS zielId, distance AS Reisedauer
WHERE gds.util.isFinite(Reisedauer)
WITH schulort, Reisedauer, gds.util.asNode(zielId) AS ziel
CALL gds.alpha.shortestPath.write({
nodeProjection: 'Haltestelle',
relationshipProjection: {
FAHRT_NACH: {
type: 'FAHRT_NACH',
properties: 'Ausfälle'
}
},
startNode: schulort,
endNode: ziel,
relationshipWeightProperty: 'Ausfälle'
})
YIELD nodeCount AS tempCount, totalCost AS Ausfälle
CALL gds.alpha.shortestPath.write({
nodeProjection: 'Haltestelle',
relationshipProjection: {
FAHRT_NACH: {
type: 'FAHRT_NACH',
properties: 'Verspätung'
}
},
startNode: schulort,
endNode: ziel,
relationshipWeightProperty: 'Verspätung'
})
YIELD nodeCount AS tempCount2, totalCost AS Verspätung
MATCH (ziel:Haltestelle)-[:BEFINDET_SICH_IN]->(o:Ortschaft)-[:GEHOERT_ZU]->(k:Kanton)
RETURN
k.Kürzel AS Kanton,
o.Name AS Ort,
k.Mietpreis AS Mietpreis,
k.Steuerfuss AS Steuerfuss,
k.Einbrüche AS Einbrüche,
min(Reisedauer) AS Reisedauer,
min(Ausfälle) AS Ausfälle,
min(Verspätung) AS Verspätung
The execution plan is the following:
One issue I see is that the anchoring query doesn't use an index. But I'm unsure why the rest takes so long. Some help would be appreciated.
10-21-2020 02:36 AM
Hey,
The first thing that I'd try is to create a named graph, rather than using an anonymous one in each query. If you use an anonymous graph it means that GDS is loading the same data from Neo4j into memory 3 times (one time per procedure).
CALL gds.graph.create(
'my-graph',
'Haltestelle',
{
FAHRT_NACH: {
type: 'FAHRT_NACH',
properties: ['Verspätung', 'Ausfälle', 'Reisedauer']
}
}
)
And then you can use that graph in the query:
MATCH (schulort:Haltestelle { BPUIC: 8502202 })
CALL gds.alpha.shortestPath.deltaStepping.stream("my-graph", {
startNode: schulort,
relationshipWeightProperty: 'Reisedauer',
delta: 3.0
})
YIELD nodeId AS zielId, distance AS Reisedauer
WHERE gds.util.isFinite(Reisedauer)
WITH schulort, Reisedauer, gds.util.asNode(zielId) AS ziel
CALL gds.alpha.shortestPath.write("my-graph", {
startNode: schulort,
endNode: ziel,
relationshipWeightProperty: 'Ausfälle'
})
YIELD nodeCount AS tempCount, totalCost AS Ausfälle
CALL gds.alpha.shortestPath.write("my-graph", {
startNode: schulort,
endNode: ziel,
relationshipWeightProperty: 'Verspätung'
})
YIELD nodeCount AS tempCount2, totalCost AS Verspätung
MATCH (ziel:Haltestelle)-[:BEFINDET_SICH_IN]->(o:Ortschaft)-[:GEHOERT_ZU]->(k:Kanton)
RETURN
k.Kürzel AS Kanton,
o.Name AS Ort,
k.Mietpreis AS Mietpreis,
k.Steuerfuss AS Steuerfuss,
k.Einbrüche AS Einbrüche,
min(Reisedauer) AS Reisedauer,
min(Ausfälle) AS Ausfälle,
min(Verspätung) AS Verspätung
Hopefully, that will reduce the query time, but let me know if not.
10-21-2020 05:08 AM
Thank you. The first time the fixed query is run it still takes about 12 seconds. The second time it only takes about 4 seconds. The execution plan still looks the same though with the same amount of db hits. Probably because the actual work is hidden inside those gds library calls? Is there a way of quantifying the costs of those java library calls?
I have since written another query for retrieving the same data but written with UNIONs. This query runs in 300ms but instead of 14'000 db hits it has 35'600 db hits. I find the first query to be simpler to grasp though. A few questions popped up in my mind:
Here's the second query:
CALL {
MATCH (schulort:Haltestelle { BPUIC: 8502202 }) // Hole Rotkreuz als Haltestelle.
CALL gds.alpha.shortestPath.deltaStepping.stream({ // Berechne kürzeste Dauer von Rotkreuz zu allen Haltestellen der Schweiz.
nodeProjection: 'Haltestelle',
relationshipProjection: {
FAHRT_NACH: {
type: 'FAHRT_NACH',
properties: 'Reisedauer'
}
},
startNode: schulort,
relationshipWeightProperty: 'Reisedauer',
delta: 3.0
})
YIELD nodeId AS zielId, distance AS Value
WHERE gds.util.isFinite(Value)
RETURN zielId, Value, 'Reisedauer' AS Typ
UNION
MATCH (schulort:Haltestelle { BPUIC: 8502202 }) // Hole Rotkreuz als Haltestelle.
CALL gds.alpha.shortestPath.deltaStepping.stream({ // Berechne kürzeste Dauer von Rotkreuz zu allen Haltestellen der Schweiz.
nodeProjection: 'Haltestelle',
relationshipProjection: {
FAHRT_NACH: {
type: 'FAHRT_NACH',
properties: 'Verspätung'
}
},
startNode: schulort,
relationshipWeightProperty: 'Verspätung',
delta: 3.0
})
YIELD nodeId AS zielId, distance AS Value
WHERE gds.util.isFinite(Value)
RETURN zielId, Value, 'Verspätung' AS Typ
UNION
MATCH (schulort:Haltestelle { BPUIC: 8502202 }) // Hole Rotkreuz als Haltestelle.
CALL gds.alpha.shortestPath.deltaStepping.stream({ // Berechne kürzeste Dauer von Rotkreuz zu allen Haltestellen der Schweiz.
nodeProjection: 'Haltestelle',
relationshipProjection: {
FAHRT_NACH: {
type: 'FAHRT_NACH',
properties: 'Ausfälle'
}
},
startNode: schulort,
relationshipWeightProperty: 'Ausfälle',
delta: 3.0
})
YIELD nodeId AS zielId, distance AS Value
WHERE gds.util.isFinite(Value)
RETURN zielId, Value, 'Ausfälle' AS Typ
}
WITH gds.util.asNode(zielId) AS ziel, Value, Typ
ORDER BY Value
MATCH (ziel:Haltestelle)-[:BEFINDET_SICH_IN]->(o:Ortschaft)-[:GEHOERT_ZU]->(k:Kanton)
WITH o, k, collect({Value: Value, Typ: Typ}) AS values
RETURN
k.Name,
o.Name,
k.Mietpreis,
k.Steuerfuss,
k.Einbrüche,
head([n IN values WHERE n.Typ = 'Reisedauer' | n.Value]) AS Reisedauer,
head([n IN values WHERE n.Typ = 'Ausfälle' | n.Value]) AS Ausfälle,
head([n IN values WHERE n.Typ = 'Verspätung' | n.Value]) AS Verspätung
And here's the corresponding execution plan:
All the sessions of the conference are now available online