Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-26-2021 03:08 PM
Hello I have a project that contains 109k nodes and 2.8 million connections. I have a place nodes, Visited edges and census block nodes. I am trying to find the top 10 places that have the most shared census block group nodes within a certain month.
Match (n:Place)<-[r:Visited{month: "April"}]-(c:CensusBG)-[b:Visited{month: "April"}]->(m:Place)
return n.name, m.name, count(distinct c) as cnt
order by cnt desc limit 10
I tried running it over night but it still did not work and I had to restart my machine to get the database back up again.
Solved! Go to Solution.
07-26-2021 11:25 PM
This is a very large graph-wide query.
First, I think we'd recommend running this on Neo4j 4.3.x, and creating a relationship index on :Visted(month)
:
Also since your pattern is symmetric, it would be better to add a filtering so you don't see symmetric results (where it's the same two places and the same count, with just the places swapped for n and m).
You can add WHERE id(n) < id(m)
, that should do the trick.
That said, if the issues you're facing are heap-related, it would be best to use subqueries to divide up the work, which should make aggregations less of a threat to your heap. Something like this maybe:
MATCH (n:Place)
CALL {
WITH n
MATCH (n)<-[r:Visited{month: "April"}]-(c:CensusBG)-[b:Visited{month: "April"}]->(m:Place)
WHERE id(n) < id(m)
RETURN n.name as name1, m.name as name2, count(c) as cnt
}
RETURN name1, name2, cnt
ORDER BY cnt DESC
LIMIT 10
This ensures the aggregations are performed with respect to only the matches from a single starting :Place node at a time, instead of executing over all possible paths in the graph, which was likely blowing your heap.
07-26-2021 11:25 PM
This is a very large graph-wide query.
First, I think we'd recommend running this on Neo4j 4.3.x, and creating a relationship index on :Visted(month)
:
Also since your pattern is symmetric, it would be better to add a filtering so you don't see symmetric results (where it's the same two places and the same count, with just the places swapped for n and m).
You can add WHERE id(n) < id(m)
, that should do the trick.
That said, if the issues you're facing are heap-related, it would be best to use subqueries to divide up the work, which should make aggregations less of a threat to your heap. Something like this maybe:
MATCH (n:Place)
CALL {
WITH n
MATCH (n)<-[r:Visited{month: "April"}]-(c:CensusBG)-[b:Visited{month: "April"}]->(m:Place)
WHERE id(n) < id(m)
RETURN n.name as name1, m.name as name2, count(c) as cnt
}
RETURN name1, name2, cnt
ORDER BY cnt DESC
LIMIT 10
This ensures the aggregations are performed with respect to only the matches from a single starting :Place node at a time, instead of executing over all possible paths in the graph, which was likely blowing your heap.
All the sessions of the conference are now available online