cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Performance issue with shortestPath on cypher query

Hello,

I am using Neo4j (3.5) via Neo4j Desktop (1.2.4) and I am trying to run a simple cypher query in order to select every user that connects to a particular user directly or through another user (2nd depth) with 3 different relationships.

The query is as follows:

profile
Match path = shortestPath((U1:DemoUser{id:”abc”})-[:Rel1|Rel2|Rel3*..2]->(U2:User))
where U1.id <= <max_user_id> and U2.id <= <max_user_id> and U1<>U2
return length(path) as numberOfHops, U2

Το evaluate my design performance, I am trying a stress test. For this, I have loaded neo4j with scipt-generated data:
And this is what I get:

Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 200004 total db hits in 20633 ms

I am using the where clause, because I have indexed my user ids .
I am including some tables depicting the current state of the database:

VM
Cores 2
Ram 32 GB
Neo4j Config
Heap 8 GB
Max_Heap 8 GB
Pagecache 10 GB
Number of Rels
Rel1 25.000.000
Rel2 15.000.000
Rel3 100.000
Data for specific User
Id abc
Rel1 191
Rel2 1
Rel3 280
2nd depth rels 200.000

This query returns 44.362 distinct users in more than 20 seconds, which to me seems like a lot! If you have any suggestions regarding the configuration/specs of the database/machine, or any optimizations regarding my query, please let me know on the comments below.

*The query’s performance does not get any better no matter how many times I run it in a row. I didn’t notice any difference between hot - cold start.

Thank you,
Lefteris

4 REPLIES 4

I believe you're using the wrong approach here by using shortestPath(), explanation below:

Be aware that when you use shortestPath(), it will MATCH for both start and end node variables according to your predicates, then attempt to find the shortest path for the resulting rows.

So this is not starting from U1 and expanding until it finds users according to the path and predicate you specified.

Instead (and you can see this in your PROFILE plan), it is performing an index lookup for U1, then using an IndexSeekByRange to get all :User nodes with an id <= max user id, and there are 100,000 of them. Then for each and every of those 100,000 pairs of nodes, it will execute a separate shortestPath() expansions to get the shortest path for the pair on that row.

If that is not what you want to do, then do not use shortestPath() for this. ShortestPath is for when you already know the start and end nodes and want the path between. You should not use it when you do not already know both the start and end node.

Hello Andrew,

Thank you very much for your response!
I have run some experiments in orientdb and I have found the function traversedVertex, which returns all nodes given a preferable depth.
Is there any similar function in neo4j or do you suggest any imrovement?

One of my thoughts was to run a query like the following:

Match (u1:DemoUser{id:"abc"})
Match p = (u1)-[:Rel1|Rel2|Rel3 *..2]->(u2)
Where u1 <> u2 and length(p)<3
Return distinct u2.id as id, length(p) as numberOfHops

That works.

Another option is using APOC path expander procedures, which can be more performant in some cases (you will want to test) when it's a highly interconnected graph:

MATCH (u1:DemoUser{id:"abc"})
CALL apoc.path.spanningTree(u1, {maxLevel:2, relationshipFilter:'Rel1>|Rel2>|Rel3>', labelFilter:'>User'}) YIELD path
RETURN last(nodes(path)) as U2, length(path) as numberOfHops

This should get shortest paths to :User nodes following the given outgoing relationships up to 2 hops away. It uses breadth first expansion (which gives you shortest path), and NODE_GLOBAL uniqueness, meaning that each node will only be visited once (any later paths to an already visited node will be discarded).

Yes your query seems to be more performant.

2X_9_9c108b592a5d576631d9b528698562388f007291.png

Cypher version: CYPHER 3.5, planner: COST, runtime: INTERPRETED. 44366 total db hits in 962 ms.

But I think that it is still too slow. I am waiting 962ms in order to get 44.362 records.

Are there any changes that I should do to my configuration in order to improve the performance?