Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-10-2021 12:18 PM
Let's assume in a shopping platform, every user is part of a single customer group and every transaction is from a customer to another customer. The data is modelled by the following paths:
(:CustomorGroup)<-[:PART_OF]-(:User)-[:TRANSACTION_FROM)->(:Transaction)-[:TRANSACTION_TO]->(:User)-[:PART_OF]->(:CustomorGroup)
Now the goal is to write a query the finds all pairs of CustomerGroups that are connected through transactions. One way of doing this is:
MATCH (cg1:CustomorGroup),(cg2:CustomorGroup) WHERE (cg1)<-[:PART_OF]-(:User)-[:TRANSACTION_FROM)->(:Transaction)-[:TRANSACTION_TO]->(:User)-[:PART_OF]->(cg2) RETURN DISTINCT cg1, cg2
Another solution which is more efficient is:
MATCH (cg1:CustomorGroup)<-[:PART_OF]-(:User)-[:TRANSACTION_FROM)->(:Transaction)-[:TRANSACTION_TO]->(:User)-[:PART_OF]->(cg2:CustomorGroup) RETURN DISTINCT cg1, cg2
But in general none of these are efficient enough because they first find all paths between pairs of CustomerGroup nodes while we are interested in knowing if there exists a path. In my data there could be many many paths between two CustomerGroups but I care only if there exists a path. I tried to tune it further by adding a hint to join on the Transaction node but still didn't change the performance a lot.
What are the more efficient queries for this usecase? How can I somehow include this logic in the query that I only care about path existence between two CustomerGroups?
All the sessions of the conference are now available online