Neo4j

shash · ‎11-28-2019

Hi Michael,

I am facing a very similar problem that you might be able to suggest a solution to very quickly. Do you have a minute? Big fan of the clarity of your answers in the community

https://community.neo4j.com/t/how-to-aggregate-calculation-of-data-faster/4131/4

shash · ‎11-28-2019

Basically, I have a graph with genres and tracks. About 1500 genres, and 7 Million tracks.
Page cache size is 10 Gb, heap is 10 Gb, and Database + Index size is 7.9G.

WITH ['rock', 'metal'] AS genres_list
MATCH  (t:Track)-[:HAS_GENRE]->(g:Genre)
WHERE g.name IN genres_list
RETURN t.name, count(DISTINCT g) as score ORDER BY score DESC LIMIT 20

When I run this I get Millions of db hits on expand and filter, and have no idea how to optimize this.

michael_hunger · ‎01-21-2020

Sorry for the delay, I didn't see your message.
Please post your question in the #neo4j-graph-platform:cypher category so that folks can help you.

Count(distinct g) will be 1 or 2 which is probably not what you want.

You probably want this

don't aggregate on properties if you can avoid it
use the degree to compute your score

but in general you can imagine why you get millions of db-hits if you fetch all tracks of those popular genres.

WITH ['rock', 'metal'] AS genres_list
MATCH  (t:Track)-[:HAS_GENRE]->(g:Genre)
WHERE g.name IN genres_list 
// some pre-filtering
WITH t, size( (t)-[:HAS_GENRE]->()) as score 
WHERE score > 1
WITH t, score
ORDER BY score DESC LIMIT 20 
RETURN t.name, score

Neo4j

RE: How to Aggregate calculation of data faster?