cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Fast count with count store on entity relations

Hello internet!

Me and my team are trying to write a query where we count the number of information that is possessed by more than a given number of people. In Cypher terms, here's our query

MATCH (info:Info)
WITH info, size((:Person)-[:HAS_INFO]->(info)) as peopleCount
WHERE peopleCount > 3
RETURN count(info)

We currently have around 150,000 info in the database, and the query profiling of this is pretty terrible. Here's what we can see

As you can see, neo4j seem to be iterating over all the data only to count it, and indeed, this query performs really poorly (500ms).

We tried looking for a different approach that would use the count store, but we can't seem to find a way to make this query faster.

Is there a magic apoc procedure or anything that would allow us to speed up this request, considering the number of info will increase in time?

2 REPLIES 2

ameyasoft
Graph Maven
Try this:

MATCH (p:Person)-[:HAS_INFO]->(i:info)
WITH id(i) as ID, count(distinct p) as Cnt where Cnt >= 3
RETURN ID as infoID, Cnt as peopleCount ORDER BY peopleCount DESC LIMIT 20

Thank you for your reply. If I try to apply your suggestion, it is indeed a little faster (around 150ms), which still makes me wonder a big number of info (millions of it).

I had to edit your query to get what I want out of it, so here's what I have:

MATCH (p:Person)-[:HAS_INFO]->(info:Info)
WITH id(info) as ID, count(p) as peopleCount
WHERE (peopleCount >= 3) 
RETURN count(ID)

Also, the actual query is a little bigger than that, but I tried to simplify the problem by providing only a part of it . If you want the real query, here it is

MATCH (info:Info)-[:MATCHES]->(pattern:Pattern)-[:PART_OF]->(patternGroup:PatternGroup)
WHERE ($sha256 = [] OR info.sha256 IN $sha256) AND
      ($patternGroups IS NULL OR patternGroup.id IN $patternGroups) AND
      (info.likelihood >= 0.5)
                  
WITH info, pattern, patternGroup, size((:Person)-[:HAS_INFO]->(info)) as peopleCount
WHERE ($minPeopleCount IS NULL OR peopleCount >= $minPeopleCount) AND
      ($maxPeopleCount IS NULL OR peopleCount < $maxPeopleCount)

RETURN count(info)

Our problems is for params

{ sha256: [], patternGroups: null, minPeopleCount: null, maxPeopleCount: null }

We tried the solution you propose on our actual query and the result is roughly the same as with size((:Person)-[:HAS_INFO]->(info))