Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-19-2019 12:10 AM
For a data like this:
create (a:User{id:'1'}),(b:User{id:'2'}),(c:User{id:'3'})
I'd like to get all User nodes, and its total count with one query.
I first tried this:
match (u:User)
with u, count(u) as total
return u.id, total
But this gives the following:
u.id total
"1" 1
"2" 1
"3" 1
If I try this:
match (u:User)
with collect(u) as users, count (distinct u) as total
unwind users as u
return u.id, total
I get what I need:
u.id total
"1" 3
"2" 3
"3" 3
Why would collect + unwind yield a different result?
Is this the most efficient query to get nodes and total counts at once?
12-19-2019 02:15 AM
It's inefficient for sure, but the fastest approach I have found is to run the match twice.
MATCH (u:User)
WHERE some condition
WITH COUNT(u) AS count
MATCH (u:User)
WHERE the same condition
RETURN u, count
In that way you can get a total count, and in the second match make a SKIP/LIMIT to introduce paging if that's the purpose.
12-19-2019 03:54 AM
you can use apoc procedure
CALL apoc.meta.stats() YIELD labels
RETURN labels
it will return all nodes count with a label
12-19-2019 04:23 AM
regarding the 2 queries returning different results, this is expected since the queries are returning results based upon different query designs. For example the first
match (u:User)
with u, count(u) as total
return u.id, total
and the with u, count(u)
is aggregating the count and grouping by u
. Since each u
is unique, the count would thus be 1 for each u
.
However for
match (u:User)
with collect(u) as users, count (distinct u) as total
unwind users as u
return u.id, total
the collect(u) as users
creates a single collection of all users and for this collection we count all users and thus a single collection of users and a count of 3.
And then unwinding the single collection simply reports each element of the collection and the count of all users
12-19-2019 06:04 PM
Hi dana,
Many thanks for a detailed response!
May I clarify one more point regarding with/collect?
I'm not quite sure why collect(u) would alter u itself.
I assumed collect(u) to be a non-destructive function, returning a list of u, but leaving u intact.
What you explained (count(u) before collect and after collect makes a difference) suggests otherwise.
Have I understood something totally wrong here...?
match (u:User)
with u, count(u) as total1
with collect(u) as users, count(u) as total2, total1 // why is u altered by collect?
unwind users as u
return u.id, total1, total2 // total1=1, total2=3
12-20-2019 05:15 AM
how is 'u' altered? altered such that the actual node itself and the properties it has are somehow updated/changed permanently? or
12-20-2019 04:32 PM
Since count(u) would return different results before/after collect(u), I assumed something had changed about 'u'. I guess I'm wrong with this way of thinking?
Also, if collect() + count() meant 'counting on a collected node group', I'm not quite sure why these two would give the same result:
with count(u) as total, collect(u) as users
with collect(u) as users, count(u) as total
If order doesn't matter, is cypher smart enough to deduce which function should be executed first? Is that written anywhere in documents?
12-30-2019 02:40 PM
To clarify, what's happening is that when you perform an aggregation (like collect() or count()), the non-aggregation variables present in the WITH (or RETURN) clause become the grouping key, which provides context for the aggregation, and also makes the grouping key distinct.
For example, if we had a month
variable and a days
variable, and we only had two months (January and February), with 31 days in January and 28 days in February modeled as (:Month)-[:HAS_DAY]->(:Day)
, and we performed the following:
MATCH (month:Month)-[:HAS_DAY]->(day:Day)
RETURN count(day) as dayCount
This would return a single row with a dayCount of 59, the count of all the day nodes found.
But if we changed the return to RETURN month, count(day) as dayCount
, then we would get two rows, one for January with a dayCount of 31, and another for February with a dayCount of 28.
If we kept day
as a variable like this: RETURN month, day, count(day) as dayCount
we would get 59 rows back, since for each row we need to output a distinct month and day, and the dayCount would be 1 for each of those 59 rows (since per row there is only a single day, and the count is with respect to the month and the day, that forms the grouping key of the aggregation).
If you wanted the count of days of the month, yet wanted to keep each day on its own row, then you would need to collect() at the same time that you count(), then UNWIND back to rows afterward, as in Dana's suggestion:
MATCH (month:Month)-[:HAS_DAY]->(day:Day)
WITH month, collect(day) as days, count(day) as dayCount
UNWIND days as day
RETURN month, day, dayCount
At the time we aggregate, the only non-aggregation variable is month
. Days are collected with respect to the month, and the days are counted with respect to the month. Then we UNWIND the list of days back to rows, leaving us with the month, a day of that month, and the count of all days in that month.
06-07-2021 01:59 AM
match(n) return count(n)
moueza
All the sessions of the conference are now available online