Neo4j

Dowanna · ‎12-19-2019

For a data like this:

create (a:User{id:'1'}),(b:User{id:'2'}),(c:User{id:'3'})

I'd like to get all User nodes, and its total count with one query.
I first tried this:

match (u:User)
with u, count(u) as total
return u.id, total

But this gives the following:

u.id	total
"1"	1
"2"	1
"3"	1

If I try this:

match (u:User)
with collect(u) as users, count (distinct u) as total
unwind users as u
return u.id, total

I get what I need:

u.id	total
"1"	3
"2"	3
"3"	3

Why would collect + unwind yield a different result?
Is this the most efficient query to get nodes and total counts at once?

Thomas_Silkjaer · ‎12-19-2019

It's inefficient for sure, but the fastest approach I have found is to run the match twice.

MATCH (u:User)
WHERE some condition
WITH COUNT(u) AS count
MATCH (u:User)
WHERE the same condition
RETURN u, count

In that way you can get a total count, and in the second match make a SKIP/LIMIT to introduce paging if that's the purpose.

12kunal34 · ‎12-19-2019

you can use apoc procedure

CALL apoc.meta.stats() YIELD labels
RETURN labels

it will return all nodes count with a label

dana_canzano · ‎12-19-2019

regarding the 2 queries returning different results, this is expected since the queries are returning results based upon different query designs. For example the first

match (u:User)
with u, count(u) as total
return u.id, total

and the with u, count(u) is aggregating the count and grouping by u. Since each u is unique, the count would thus be 1 for each u.

However for

match (u:User)
with collect(u) as users, count (distinct u) as total
unwind users as u
return u.id, total

the collect(u) as users creates a single collection of all users and for this collection we count all users and thus a single collection of users and a count of 3.
And then unwinding the single collection simply reports each element of the collection and the count of all users

Dowanna · ‎12-19-2019

Hi dana,

Many thanks for a detailed response!

May I clarify one more point regarding with/collect?

I'm not quite sure why collect(u) would alter u itself.

I assumed collect(u) to be a non-destructive function, returning a list of u, but leaving u intact.
What you explained (count(u) before collect and after collect makes a difference) suggests otherwise.
Have I understood something totally wrong here...?

match (u:User)
with u, count(u) as total1
with collect(u) as users, count(u) as total2, total1 // why is u altered by collect?
unwind users as u
return u.id, total1, total2 // total1=1, total2=3

dana_canzano · ‎12-20-2019

how is 'u' altered? altered such that the actual node itself and the properties it has are somehow updated/changed permanently? or

Dowanna · ‎12-20-2019

Since count(u) would return different results before/after collect(u), I assumed something had changed about 'u'. I guess I'm wrong with this way of thinking?

Also, if collect() + count() meant 'counting on a collected node group', I'm not quite sure why these two would give the same result:

with count(u) as total, collect(u) as users

with collect(u) as users, count(u) as total

If order doesn't matter, is cypher smart enough to deduce which function should be executed first? Is that written anywhere in documents?

andrew_bowman · ‎12-30-2019

To clarify, what's happening is that when you perform an aggregation (like collect() or count()), the non-aggregation variables present in the WITH (or RETURN) clause become the grouping key, which provides context for the aggregation, and also makes the grouping key distinct.

For example, if we had a month variable and a days variable, and we only had two months (January and February), with 31 days in January and 28 days in February modeled as (:Month)-[:HAS_DAY]->(:Day), and we performed the following:

MATCH (month:Month)-[:HAS_DAY]->(day:Day)
RETURN count(day) as dayCount

This would return a single row with a dayCount of 59, the count of all the day nodes found.

But if we changed the return to RETURN month, count(day) as dayCount, then we would get two rows, one for January with a dayCount of 31, and another for February with a dayCount of 28.

If we kept day as a variable like this: RETURN month, day, count(day) as dayCount we would get 59 rows back, since for each row we need to output a distinct month and day, and the dayCount would be 1 for each of those 59 rows (since per row there is only a single day, and the count is with respect to the month and the day, that forms the grouping key of the aggregation).

If you wanted the count of days of the month, yet wanted to keep each day on its own row, then you would need to collect() at the same time that you count(), then UNWIND back to rows afterward, as in Dana's suggestion:

MATCH (month:Month)-[:HAS_DAY]->(day:Day)
WITH month, collect(day) as days, count(day) as dayCount
UNWIND days as day
RETURN month, day, dayCount

At the time we aggregate, the only non-aggregation variable is month. Days are collected with respect to the month, and the days are counted with respect to the month. Then we UNWIND the list of days back to rows, leaving us with the month, a day of that month, and the count of all days in that month.

mouezapeter · ‎06-07-2021

match(n) return count(n)

moueza

Neo4j

How to get total count and nodes with one query