Neo4j

marek_kasle_na · ‎11-05-2020

For example

MATCH path = shortestPath((s {http://dbpedia.org/ontology/uri:"http://identifiers.org/uniprot/P0DTC2-PRO_0000449649"})-[*]-(humanprot)) WHERE humanprot.http://dbpedia.org/ontology/uri IN ["http://identifiers.org/hgnc/15022"] RETURN path

finishes in about a second, while

MATCH (s {http://dbpedia.org/ontology/uri:"http://identifiers.org/uniprot/P0DTC2-PRO_0000449647"})-[*]-(humanprot) WHERE humanprot.http://dbpedia.org/ontology/uri IN ["http://identifiers.org/hgnc/15022"] RETURN * LIMIT 1

runs for ages and is still running while I am writing this.

Why if I remove the shortestPath() function from the query and put there LIMIT 1 instead, it becomes suddenly so much more complicated? Intuitively, it seems that finding any path should be faster than finding shortest path.

andrew_bowman · ‎11-06-2020

Actually neither of these is guaranteed, and I'd put it up to luck that one ran faster than another.

You're not using labels here, which is the major problem, so this is starting with an AllNodesScan, and in one of the queries it just happened to be the first to stumble upon a node that was the start of one of these paths, and (for shortest path) happened to find first via the cartesian product (another all nodes scan) another node for which the path exists.

You need to use labels in your patterns, and you need to create indexes or unique constraints to support even faster lookup.

Neo4j

Why is shortestPath() faster than "find 1 random path"?