cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

RE: How to Aggregate calculation of data faster?

shash
Node Link

Hi Michael,

I am facing a very similar problem that you might be able to suggest a solution to very quickly. Do you have a minute? Big fan of the clarity of your answers in the community

https://community.neo4j.com/t/how-to-aggregate-calculation-of-data-faster/4131/4

2 REPLIES 2

shash
Node Link

Basically, I have a graph with genres and tracks. About 1500 genres, and 7 Million tracks.
Page cache size is 10 Gb, heap is 10 Gb, and Database + Index size is 7.9G.

WITH ['rock', 'metal'] AS genres_list
MATCH  (t:Track)-[:HAS_GENRE]->(g:Genre)
WHERE g.name IN genres_list
RETURN t.name, count(DISTINCT g) as score ORDER BY score DESC LIMIT 20 

When I run this I get Millions of db hits on expand and filter, and have no idea how to optimize this.

2X_e_ecb71f8382da4f293106d6033d58420d58e905b7.png

Sorry for the delay, I didn't see your message.
Please post your question in the #neo4j-graph-platform:cypher category so that folks can help you.

Count(distinct g) will be 1 or 2 which is probably not what you want.

You probably want this

  1. don't aggregate on properties if you can avoid it
  2. use the degree to compute your score

but in general you can imagine why you get millions of db-hits if you fetch all tracks of those popular genres.

WITH ['rock', 'metal'] AS genres_list
MATCH  (t:Track)-[:HAS_GENRE]->(g:Genre)
WHERE g.name IN genres_list 
// some pre-filtering
WITH t, size( (t)-[:HAS_GENRE]->()) as score 
WHERE score > 1
WITH t, score
ORDER BY score DESC LIMIT 20 
RETURN t.name, score