Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-14-2020 02:12 AM
Hi there,
I've got a database with 18k nodes and 200m relationships. I've used one of the node's properties ("User_ID") as the ID, and I've created a unique constraint on this property.
When I run a query to get 2 random nodes by their User IDs, the results return quickly:
MATCH (m {User_ID: '1000'}), (n {User_ID: '1232'})
RETURN m,n
However, when wanting to return all relationships between these 2 nodes, the query takes ~20 minutes to run the first time:
MATCH (m {User_ID: '1000'})-[r]-(n {User_ID: '1232'})
RETURN m,n,r
These relationships have 7 properties each. How can I improve the performance of this query? I've read that it's generally a bad idea to have properties with relationships; is this why? I'm unable to find any material on indexing relationships - I thought they would have a clustered index applied to them by default.
I'd like to keep the properties on the relationships as turning these into nodes would clutter the graph, unless this is the sole reason why it's not performing.
Specs:
Many thanks,
Nick
10-14-2020 02:36 AM
Hello Nick,
This is because you are not using the index that you have created. You have to add a label to your nodes that you matching to use the index. Otherwise, your query will perform a AllNodesScan which is very bad for performance.
MATCH (m:Label {User_ID: '1000'})-[r]-(n:Label {User_ID: '1232'})
RETURN m,n,r
You should take a look at query tuning using this documentation for starters
Hope this helps
10-14-2020 03:22 AM
Hi Tarendran,
Thanks for your reply. I should have specified - I only have one node label and one link label at the moment, so unfortunately specifying these hasn't made much of a difference.
I've looked at the profiler, and unsurprisingly, the "Expand(into)" operation seems most taxing, producing 22,211 db hits. Is there a way to reduce this? Other operations include 2x NodeUniqueIndexSeek, CartesianProduct and ProduceResults, but these aren't nearly as bad as the Expand(into) operation.
Thank you for your help so far
Nick
10-14-2020 03:53 AM
Hi Nick,
Since you are looking for all the relationships between these two nodes, the operation taking place must expand all to transverse the graph. You might wanna take a look at your graph model. The basic idea behind the graph model is to determine what questions will you be asking and what the answers should be, based on this you will want to achieve the answer in the shortest possible graph transversal (take note this may not always be the case).
Best of luck
10-14-2020 05:15 PM
To double-check, please run a PROFILE of the query, and provide the query plan with all elements expanded. That may give us clues to what's going on.
All the sessions of the conference are now available online