Neo4j

julian_n_de_jon · ‎10-14-2020

Hi there,

I've got a database with 18k nodes and 200m relationships. I've used one of the node's properties ("User_ID") as the ID, and I've created a unique constraint on this property.

When I run a query to get 2 random nodes by their User IDs, the results return quickly:

MATCH (m {User_ID: '1000'}), (n {User_ID: '1232'}) 
RETURN m,n

However, when wanting to return all relationships between these 2 nodes, the query takes ~20 minutes to run the first time:

MATCH (m {User_ID: '1000'})-[r]-(n {User_ID: '1232'})
RETURN m,n,r

These relationships have 7 properties each. How can I improve the performance of this query? I've read that it's generally a bad idea to have properties with relationships; is this why? I'm unable to find any material on indexing relationships - I thought they would have a clustered index applied to them by default.

I'd like to keep the properties on the relationships as turning these into nodes would clutter the graph, unless this is the sole reason why it's not performing.

Specs:

neo4j version 4.1.0
heap initial and max size: 8GB

Many thanks,

Nick

tarendran_vivek · ‎10-14-2020

Hello Nick,

This is because you are not using the index that you have created. You have to add a label to your nodes that you matching to use the index. Otherwise, your query will perform a AllNodesScan which is very bad for performance.

MATCH (m:Label {User_ID: '1000'})-[r]-(n:Label {User_ID: '1232'})
RETURN m,n,r

You should take a look at query tuning using this documentation for starters
Hope this helps

julian_n_de_jon · ‎10-14-2020

Hi Tarendran,

Thanks for your reply. I should have specified - I only have one node label and one link label at the moment, so unfortunately specifying these hasn't made much of a difference.

I've looked at the profiler, and unsurprisingly, the "Expand(into)" operation seems most taxing, producing 22,211 db hits. Is there a way to reduce this? Other operations include 2x NodeUniqueIndexSeek, CartesianProduct and ProduceResults, but these aren't nearly as bad as the Expand(into) operation.

Thank you for your help so far
Nick

tarendran_vivek · ‎10-14-2020

Hi Nick,

Since you are looking for all the relationships between these two nodes, the operation taking place must expand all to transverse the graph. You might wanna take a look at your graph model. The basic idea behind the graph model is to determine what questions will you be asking and what the answers should be, based on this you will want to achieve the answer in the shortest possible graph transversal (take note this may not always be the case).

Best of luck

andrew_bowman · ‎10-14-2020

To double-check, please run a PROFILE of the query, and provide the query plan with all elements expanded. That may give us clues to what's going on.

Neo4j

Querying relationships slow performance