Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-28-2022 04:11 AM
Hello Neo4j Community,
How do I find multiple distinct short paths between 2 nodes in a graph with 7.5M nodes and 20M relationships?
We want a feature similar to how google maps shows other alternative routes.
problems with:
Dijkstra, shortestPath and allShortestPaths:
Only returns the shortest path or paths with the shortest length.
Yen's k shortest paths:
Absurdly slow on a big graph
Iterate over list of numbers 0-10 and call allShortestPaths with minimum number of length of i:
Absurdly slow on a big graph
01-02-2023 05:04 AM
After some time, I figured that filtering/blacklisting for certain relationships that already has been used for another path could work. However, when I tried this in practice i noticed that filtering after running allShortestPaths wasn't a viable solution. After this I figured doing allShortestPaths mulitple times and when a path is found I could rename the type to currentTypeName_TEMP, then rename it after again.
To find multiple distinct paths, this could be done with an algorithm shown on the picture, and given the effectiveness of allShortestPaths, this will not be that computational or time intensive (dotted lines means found relationships).
01-02-2023 05:47 AM
look At the apoc.path procedures, expandOath in particular.
https://neo4j.com/labs/apoc/4.1/overview/apoc.path/apoc.path.expandConfig/
01-02-2023 06:32 AM
@glilienfield we previously looked at expand config, however, it seems like it expands all nodes with N hops from startNode, which is very inefficient when dealing with the size of our database and also when there is no guarantee that a path exist. Am I missing something?
01-02-2023 06:41 AM
You can specify a target node, which looks like your end node, so you can find a path between two nodes. All the algorithms will need to start from a point and traverse the graph looking for the end node. The more nodes and relationships you can filter out the more efficient the search will be.
You could always build your own algorithm as a custom procedure. You will need to deploy it on your server(s) though.
01-02-2023 06:50 AM
Yes we are aware, however, expand config is not as fast at searching as allShortestPaths.
This is just too much effort for something that could be pretty easy. I think we just implement my provided solution.
01-02-2023 07:31 AM
Hi @MathiasGraabeck
Can you tell me more about what you mean by "multiple distinct short paths between 2 nodes"
0 - What is the domain of your graph ( What's the graph is talking about roughly )
1 - What is, in english ( no Cypher or apoc ), the question you are trying to solve
2 - Are you looking for only 2 specific nodes amung all of them when you ask the question to your grah, or you are looking at all possible pair of nodes in your graph?
3 - For this or these pairs, do you have weight in your relationship?
You can see weight as beign the time it takes to go from node A to node B, in your provided example it could be a a property inside a relationship directly connected road corners ( nodes ). This could depend on the size of the road, condition etc. Using weight your path lenght is not simply define by the number of nodes crossed along the way, but by the weight of their relationship.
Can you define a bit more what you mean by weight?
If if this what you are trying to archive, and I think so, it does't seem you have specify to use weight in your previous attemp. Can we see your queries?
01-02-2023 09:20 AM
Hello @tard_gabriel,
0 - What is the domain of your graph
We have a graph with data of many companies, and we are dealing with fraud prevention. To learn more about MAP Group; the company I'm working for, please visit:
LinkedIn: https://www.linkedin.com/company/mapg/
website: https://mapgroup.dk/
1 - What is the question I'm trying to solve
Finding multiple shortest paths. This includes all shortest paths, but also the next shortest paths, and maybe even the next shortest paths. This should be possible with any length, with max N hops, or if there are too many paths, then stop getting paths when a collection of paths with a size of M is reached. Where N and M could be any number.
The reason for this is because, we want to give a bigger overview than using just allShortestPaths, but not the complete overview, for finding every path with N hops from the start node, and then hopefully reach the endNode.
2 - Are you looking for only 2 specific nodes among all of them when you ask the question to your grah, or you are looking at all possible pair of nodes in your graph?
My solution are just parsing the start and end node. I don't really care if I get all possible node pairs, I only care about performance.
3 - For this or these pairs, do you have weight in your relationship?
No, the reason being that there is no need for weights in my case, all hops will be of weight=1.
4 - Can you define a bit more what you mean by weight?
I did not mention weights.
5 - Can we see your queries?
My solution is at this point just pseudo code, and I think it might contain too much private information to share. However, the result of the steps in the algorithm is shown in the picture of my solution.
To clarify the image for my solution:
using allShortestPaths, paths (in this case only looking at the relationships) that are found: [[1,5], [2,4]]
then excluding the 1 and 4 relationships by renaming their types to relName_TEMP.
then allShortestPaths again will find path: [2,3,5]
then restoring the type names of 1 and 4, and excluding the 2 and 5 relationships.
then allShortestPaths again will find path: [1,3,4]
relationships.
Then there is no more known relations therefore exclude all known relationships to this point (1,2,3,4,5).
then by using allShortestPaths again will reveal path: [6,7,8,9]
leaving the end result of (only looking at relationships) [[1,5], [2,4], [2,3,5], [1,3,4], [6,7,8,9]].
Then the algorithm is stopped since a desirable amount of paths is found. However, many more paths could be available in the much larger graph, and this is just the tiny snip of nodes+relationships that we know from the result of the allShortestPath algorithm.
I hope this helps to see the case and the problem, otherwise please don't hesitate to reach out again.
Best wishes,
Mathias Graabeck, Software engieneer @ MAP Group
All the sessions of the conference are now available online