Neo4j

andyking27 · ‎01-20-2021

We have a large network stored in v3.4.8 that incorporates nodes representing Companies and People. The edges between the nodes represent Appointments (i.e. Person 1 works at Company A).
Some of the People nodes are actually companies who function as if they are People (and are stored in the graph with a label of 'Is Company' = 1).

We are calculating the shortest path between companies using the built-in shortestPath cypher function -
match (start:Company {id:1}),(end:Company {id:100}), p = shortestPath((start)-[:APPOINTMENT*]-(end)) return *

What we want to do now is return the shortest path that doesn't involve a People node where 'Is Company' = 1.

I have tried using queries like where ALL(x in nodes(p) WHERE x.Is Company = 0) but performance is awful even with an index on 'Is Company'.

tard_gabriel · ‎01-20-2021

WHERE NONE (n IN nodes(p) WHERE n.Is Company = 1) ?

OR SOMETHING DIFFERENT IN PURE OLD SCHOOL CYPHER
DO NOT FORGET TO INDICATE A RELATIONSHIP DIRECTION IF YOU CAN ( OR TIME WILL DOUBLE )

MATCH path = (s:Company {id:1})-[:APPOINTMENT*]->(e:Company {id:100})
RETURN path AS Path, length(path) AS Length ORDER BY Length

Not quite perfect this last one but can give you a kind of debugger.

Or you can use the PROFILE clause on the shortestPath one to see what is happening there.

andyking27 · ‎01-28-2021

We got the following cypher to work -
match (start:Company {id:1}),(end:Company {id:100}), p = shortestPath((start)-[:APPOINTMENT*]-(end)) WHERE ALL(x in nodes(p) where x.'Is Company' = 0 OR x.'Is Company' IS NULL) return *

However, we also have a query similar to the following -
match (start:Company {id:1}),(end:Person), p = shortestPath((start)-[:APPOINTMENT*]-(end)) where end.id IN [<list of comma separated ids>] return * order by length(p) limit 1 which is designed to get the shortest path from a user-defined list of people nodes. This list has no limit on the number of entries. When trying to apply the same logic from the first query the performance is awful and, depending on the length of the list, can cause the server to fall over. The query is -
match (start:Company {id:1}),(end:Person), p = shortestPath((start)-[:APPOINTMENT*]-(end)) where ALL(x in nodes(p) where x.'Is Company' = 0 OR x.'Is Company' IS NULL) and end.id IN [<list of comma separated ids>] return * order by length(p) limit 1

Can anyone suggest any alternatives where the performance is not as bad? There are indexes on the node.id labels and the 'Is Company' label.

clem · ‎01-28-2021

I think the problem you are falling into is that the number of nodes matched (end:Person) is huge. I think you need to filter out the end matches. If you do PROFILE on the query, you'll see what I mean.

I'll confess that I haven't used shortestPath(), so I'm not 100% sure that this will work, but I tried this somewhat analogous (and simpler) query with the movie DB and it worked:

MATCH(start:Person) WHERE start.name = "Rob Reiner"
// limited possible ends:
MATCH (end:Person) WHERE end.name STARTS WITH "Michael" 
WITH shortestPath((start)-[*]-(end)) AS p
RETURN p ORDER BY length(p)

So, I suggest trying something like this (although I'm almost sure I got the syntax screwed up after the WITH.)

MATCH (start:Company {id:1})
MATCH (end:Person) WHERE end.id IN [<list of comma separated ids>]
// the following part needs fixing!
WITH shortestPath((start)-[:APPOINTMENT*]-(end)) where ALL(x in nodes(p) where x.'Is Company' = 0 OR x.'Is Company' IS NULL) 
 return * order by length(p) limit 1

It would be helpful if this documentation includes complex conditionals using WHERE for start and end nodes plus relationships:

(I gave some feedback on this doc page.).

andrew_bowman · ‎02-01-2021

It sounds like you're trying to find which of those persons with the given ids is closest to the company.

If you have APOC Procedures you can use the path expander to discover this. We will need to pre-match to the persons and collect() them into a list, and submit them as the end nodes of the expansion, and that we only want 1 result. Because breadth-first expansion is used, it will find the single closest result. Keep in mind though that this only expands in one direction, unlike the shortestPath() function.

That said, the path expander procs don't do filtering of properties during expansion, but it can blacklist labels, including compound labels. So if the People/Company nodes are labeled as both :Person and :Company, then we can blacklist :Person:Company nodes easily.

MATCH (end:Person)
WHERE end.id IN [2, 5]
WITH collect(end) as endPersons
MATCH (start:Company {id:1})
CALL apoc.path.subgraphNodes(start, {relationshipFilter:'APPOINTMENT', endNodes:endPersons, limit:1, labelFilter:'-Person:Company'}) YIELD node
RETURN node

Note that this will traverse any other kind of node, include just :Person nodes, or just :Company nodes. You can tweak the labelFilter if you require additional restrictions.

wumirose · ‎06-04-2022

I needed to do a similar thing so I found on StackOverflow , the following solution.

The returned collection of paths does not contain any path that contains an Company node whose id is equal to 1:

MATCH p=allShortestPaths( (a:Company {id:1})-[*]-(b:Company {id:100}) )
WHERE NONE(x IN NODES(p) WHERE x:Company AND x.id = 1)
RETURN p;

Neo4j

Shortest Path between nodes excluding nodes with a certain label