cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to fetch data on from data base having around 60 million node of one type?

HI
i need to run simple match query
match(n:label)-[:has_example]->(m:label2) with m, count(n.property) as c, collect(n.property) as c2 where c>100
return m.property, c,c2 order by c desc limit 10
on 60+ million nodes ...

and my server is not able to run ,.. how can i make it run via parallel or any other way..??

4 REPLIES 4

I am confused on your query, you have
collect(n.property) as c, collect(n.property) as c2
What is the point of collecting it twice?

As for running in parallel, one method to do this would be using apoc.periodic.iterate. In this case you can't simply return something, you must either set or create. Since you are grouping by m.property what you could do is something like this:
CALL apoc.periodic.iterate("MATCH (m:label2) WHERE size((m)<-[:has_example]-()) > 0 RETURN m","WITH m MATCH (m)<-[:has_example]-(n:label) SET m.count_property = count(n.property) , m.count_property2 = count(n.property)",{batchSize:10000,parallel:true,iterateList:true})
Then you could run something like
MATCH (m:label2) RETURN m.property, m.count_property AS c,m.count_property2 AS c2 ORDER BY c DESC LIMIT 10
This can further be sped up with CREATE INDEX ON :label2(count_property)

MATCH (n:user)-[:has_mobile]->(m:Mobile) with m.mobile as Mob_Number, count(n.ID) as c, collect(n.ID) as id
where c>100
return Mob_Number,id, c order by c desc limit 10

this is the exact query one is count and other is collect

i just need to read data of top ten count

Then it is the same as above only once you get the top ten in the second query you would expand those in order to get the collection of n.ID

What does your memory config and disk IO look like?

Can you run PROFILE instead of EXPLAIN?

You could try this:

match(n:label)-[:has_example]->(m:label2) 
with m, count(*) as c, collect(n) as c2 where c>100
return m.property, c,c2 
order by c desc limit 10

Or even better, just do the cheap aggregation + sorting first, then go and re-fetch the related data for the 10 top nodes.

Please take into account that it might blow up the browser if your c is really large, as you would return lists with millions of properties

match(n:label)-[:has_example]->(m:label2) 
with m, count(*) as c where c>100
with m, c order by c desc limit 10
match (n:label)-[:has_example]->(m) 
with m, c, collect(n.property) as c2 
return m.property, c, c2