Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-07-2023 10:34 AM - edited 02-07-2023 04:28 PM
Hello,
I would like to get the beehive's thoughts on this topic.
I have a pretty dense graph consisting of 700,000 nodes and 14,000,000 relationships. We also spent months refactoring the graph data model so now we have something that works and is efficient enough. On some of these relationship types we have multiple relationship properties that I would like to use in other parts of the query to filter nodes.
My question is: what is the best practice in terms of performance and efficiency to use relationship properties to filter nodes in the graph database?
For example, let's say I have this query:
MATCH(m:Machine)-[r1:ACTIVITY_ON {datetime, processes_id, storage_id}]->(p:Project)-[r2:SOLD_TO]->(c:Customer)-[r3:USED_WITH]->(p:Device {process: r1.process_id})
In this example what I'm trying to do is use the relationship properties found in r1 to filter the Device nodes so that the process Id in the Device nodes matches the process Id found in the r1 relationship type between Machine nodes and Device nodes.
This example captures exactly what I'm trying to do, but with about 7 million relationships between Machine and Project nodes.
Thanks
Solved! Go to Solution.
02-07-2023 08:34 PM
Try this:
//select Machine and Project nodes where r1.processes_id is not null
//select some values for datetime or processes_id or storage_id to narrow your search
MATCH(m:Machine)-[r1:ACTIVITY_ON]->(p:Project)
where r1.processes_id is not null
//Check the whether it's processes_id or process_id and use the correct one
with m, r1, p
//Collect devices that match with the above selection
match (d:Device {process: r1.processes_id})
with m, r1, p, d
//Match the Project, Customer that supports the above Devices
match (p)-[r2:SOLD_TO]]-(c:Customer)-[r3:USED_WITH]->(d)
return m, r1, p, c, d limit 22
02-07-2023 02:04 PM
Or is it faster and more efficient to do something like
MATCH(m:Machine)-[r1:ACTIVITY_ON {datetime, processes_id, storage_id}]->(p:Project)
WITH m, r1, p
MATCH(m)-[r1]->()-[r2:SOLD_TO]->(c:Customer)-[r3:USED_WITH]->(p:Device {process: r1.process_id})
The thought here is that I'm collecting the properties first in that first match statement and then passing it to the next query using the WITH statement.
Not sure what would be better in my case.
02-07-2023 08:34 PM
Try this:
//select Machine and Project nodes where r1.processes_id is not null
//select some values for datetime or processes_id or storage_id to narrow your search
MATCH(m:Machine)-[r1:ACTIVITY_ON]->(p:Project)
where r1.processes_id is not null
//Check the whether it's processes_id or process_id and use the correct one
with m, r1, p
//Collect devices that match with the above selection
match (d:Device {process: r1.processes_id})
with m, r1, p, d
//Match the Project, Customer that supports the above Devices
match (p)-[r2:SOLD_TO]]-(c:Customer)-[r3:USED_WITH]->(d)
return m, r1, p, c, d limit 22
02-08-2023 11:52 AM
Hi @ameyasoft ,
I've been implementing your approach and it is helping out tremendously. Let me test it out some more before I mark your post as the accepted solution.
So far with my test queries I'm seeing huge improvements in query processing time, like 10X improvements.
So the key idea here is to first use the WITH statement on the "r" variable on the Relationship Type to collect all of the relationship properties I need. And then in the next MATCH statement I collect the nodes I need, use another WITH statement to finally pass the nodes to the final MATCH query to display the results.
I think I recall in the Query Tuning Cypher class being taught to always collect the nodes early so this is making sense.
Cheers!
02-08-2023 12:14 PM
Thanks for your appreciation!
All the sessions of the conference are now available online