Neo4j

carib · ‎02-07-2023

Hello,

I would like to get the beehive's thoughts on this topic.

I have a pretty dense graph consisting of 700,000 nodes and 14,000,000 relationships. We also spent months refactoring the graph data model so now we have something that works and is efficient enough. On some of these relationship types we have multiple relationship properties that I would like to use in other parts of the query to filter nodes.

My question is: what is the best practice in terms of performance and efficiency to use relationship properties to filter nodes in the graph database?

For example, let's say I have this query:

MATCH(m:Machine)-[r1:ACTIVITY_ON {datetime, processes_id, storage_id}]->(p:Project)-[r2:SOLD_TO]->(c:Customer)-[r3:USED_WITH]->(p:Device {process: r1.process_id})

In this example what I'm trying to do is use the relationship properties found in r1 to filter the Device nodes so that the process Id in the Device nodes matches the process Id found in the r1 relationship type between Machine nodes and Device nodes.

This example captures exactly what I'm trying to do, but with about 7 million relationships between Machine and Project nodes.

Thanks

ameyasoft · ‎02-07-2023

Try this:

//select Machine and Project nodes where r1.processes_id is not null
//select some values for datetime or processes_id or storage_id to narrow your search

MATCH(m:Machine)-[r1:ACTIVITY_ON]->(p:Project)
where r1.processes_id is not null

//Check the whether it's processes_id or process_id and use the correct one


with m, r1, p

//Collect devices that match with the above selection
match (d:Device {process: r1.processes_id})

with m, r1, p, d

//Match the Project, Customer that supports the above Devices

match (p)-[r2:SOLD_TO]]-(c:Customer)-[r3:USED_WITH]->(d)

return m, r1, p, c, d limit 22

View solution in original post

carib · ‎02-07-2023

Or is it faster and more efficient to do something like

MATCH(m:Machine)-[r1:ACTIVITY_ON {datetime, processes_id, storage_id}]->(p:Project)
WITH m, r1, p
MATCH(m)-[r1]->()-[r2:SOLD_TO]->(c:Customer)-[r3:USED_WITH]->(p:Device {process: r1.process_id})

The thought here is that I'm collecting the properties first in that first match statement and then passing it to the next query using the WITH statement.

Not sure what would be better in my case.

ameyasoft · ‎02-07-2023

Try this:

//select Machine and Project nodes where r1.processes_id is not null
//select some values for datetime or processes_id or storage_id to narrow your search

MATCH(m:Machine)-[r1:ACTIVITY_ON]->(p:Project)
where r1.processes_id is not null

//Check the whether it's processes_id or process_id and use the correct one


with m, r1, p

//Collect devices that match with the above selection
match (d:Device {process: r1.processes_id})

with m, r1, p, d

//Match the Project, Customer that supports the above Devices

match (p)-[r2:SOLD_TO]]-(c:Customer)-[r3:USED_WITH]->(d)

return m, r1, p, c, d limit 22

carib · ‎02-08-2023

Hi @ameyasoft ,

I've been implementing your approach and it is helping out tremendously. Let me test it out some more before I mark your post as the accepted solution.

So far with my test queries I'm seeing huge improvements in query processing time, like 10X improvements.

So the key idea here is to first use the WITH statement on the "r" variable on the Relationship Type to collect all of the relationship properties I need. And then in the next MATCH statement I collect the nodes I need, use another WITH statement to finally pass the nodes to the final MATCH query to display the results.

I think I recall in the Query Tuning Cypher class being taught to always collect the nodes early so this is making sense.

Cheers!

ameyasoft · ‎02-08-2023

Thanks for your appreciation!

Neo4j

Using Relationship Properties to Filter Other Nodes