Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-20-2019 05:04 PM
I'm having some trouble understanding the WITH
clause, when used with aggregation functions like count()
.
I tried using this query from the Cypher ref card as an example -
MATCH (user)-[:FRIEND]-(friend)
WHERE user.name = $name
WITH user, count(friend) AS friends
WHERE friends > 10
RETURN user
What is the purpose of count(friend)
in this query? Is it just counting all the end nodes which are connected to the given user with the FRIEND
relationship? If so, why is the query returning the given user node? Because we already know who the starting user is.
I tried loading the "movies" graph in Neo4j, to run similar queries. Then, I modified the above query for the movies graph, like so -
MATCH (h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH h, count(m) as movies
WHERE movies > 3
RETURN h, movies
This is what the query returns -
╒════════════════════════════════╤════════╕
│"h" │"movies"│
╞════════════════════════════════╪════════╡
│{"name":"Tom Hanks","born":1956}│12 │
└────────────────────────────────┴────────┘
In theory, I understand what this query is doing - return a count of all movies where Tom Hanks "acted in", if they are greater than 3. But in terms of the query and the aggregation function, what is count(m)
doing? Is it a count of all movie nodes where Tom Hanks is the start node with "ACTED_IN" relationship connecting the two nodes?
If so, why do the following queries not return anything?
MATCH (h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH h, r, count(m) as movies
WHERE movies > 3
RETURN h,r, movies
MATCH (h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH h,r, count(m) as movies
WHERE movies > 3
RETURN movies
MATCH (h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH h, m,count(m) as movies
WHERE movies > 3
RETURN h, m
MATCH (h:Person)-[r:ACTED_IN]->(m:Movie)
WITH h, r, m,count(m) as movies
WHERE movies > 6
RETURN h, m
MATCH p=(h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH p, count(r) as movies
WHERE movies > 3
RETURN p
Queries 1 through 4 are just different variations of the main query, where I just add different variables to the WITH clause, so they 're carried over to the next stage of the query. In Query 5, I try to count the relationships, which should ideally behave the same way as counting the destination (movie) node, but doesn't work.
I also tried fiddling with the FOLLOWS
relationship in the movie graph, with this query -
MATCH p=()-[r:FOLLOWS]->(b:Person)
WITH count(b) AS popular
WHERE popular > 1
RETURN popular;
This query returns 3, because there are three FOLLOWS relationships in the graph -
══════════════════════════════════════════════════════════╕
│"p" │
╞══════════════════════════════════════════════════════════╡
│[{"name":"Paul Blythe"},{},{"name":"Angela Scope"}] │
├──────────────────────────────────────────────────────────┤
│[{"name":"James Thompson"},{},{"name":"Jessica Thompson"}]│
├──────────────────────────────────────────────────────────┤
│[{"name":"Angela Scope"},{},{"name":"Jessica Thompson"}] │
└──────────────────────────────────────────────────────────┘
But if I add the WITH
clause and count()
to this query, it doesn't return anything
MATCH p=(a)-[r:FOLLOWS]->(b:Person)
WITH a,count(b) AS pop
WHERE pop > 1
RETURN a,pop;
I think I'm missing some key piece of underlying information about how WITH and count() works.
11-21-2019 05:39 AM
why does
not work? because for each relationship (which happens to be unique) from the Tom Hanks :Person node you are counting number of movies.
So with h, r,count(m) as movies
is effectively evaluating to
Tom Hanks, releationship001, 1
Tom Hanks, releationship002, 1
Tom Hanks, releationship003, 1
Tom Hanks, releationship004, 1
and for example relationship001
refers to the relationship to the :Movie 'Apollo 13', and relationship002
refers to the :Movie 'Joe Versus the Volcano. Remove the
rin the
with h,r,count(m)` and you will then get number of movies that Tom Hanks acted in.
Regarding
2. MATCH (h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH h,r, count(m) as movies
WHERE movies > 3
RETURN movies
same as #1
Regarding
3. MATCH (h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH h, m,count(m) as movies
WHERE movies > 3
RETURN h, m
effectively the same as #1 . the WITH h, m,count(m) as movies
effectively evaluates to
Tom Hanks, Apollo13, 1
Tom Hanks, Joe Versus the Volcano, 1
Regarding
4. MATCH (h:Person)-[r:ACTED_IN]->(m:Movie)
WITH h, r, m,count(m) as movies
WHERE movies > 6
RETURN h, m
more or less the same as #1 and #3 as this then evaluates to
Tom Hanks, relationship001, Apollo13, 1
Tom Hanks, relationship002, Joe Versus the Volcano, 1
```
Regarding
5. ```
MATCH p=(h:Person{name:"Tom Hanks"})-[r:ACTED_IN]->(m:Movie)
WITH p, count(r) as movies
WHERE movies > 3
RETURN p
```
each P represents a unique Path and thus this evaluates to
TomHanks-ACTED_IN->Apollo13, 1,
TomHanks-ACTED_IN->Joe Versus The Volcano,1
11-21-2019 11:39 AM
Adding a bit more context, when you use aggregation functions, the non-aggregation variables become the grouping key of the aggregation, which is the context by which the aggregation is performed.
In your examples that aren't returning data, your aggregation (the count of movies) is with respect to the non-aggregation variables forming the grouping key.
WITH h, r, count(m) as movies
"For each person node (Tom Hanks) and the specific relationship r
on this row, get the count of movies"
Since a specific relationship r
can only point to one movie, the count will always be 1 per row.
WITH h, m,count(m) as movies
"For each person node (Tom Hanks) and the specific movie m
on this row, get the count of movies"
Since m
on this row will always be one specific movie (for example the results of WITH h, m
), the count of that movie per row will always be 1.
And in your last example:
MATCH p=(a)-[r:FOLLOWS]->(b:Person)
WITH a,count(b) AS pop
WHERE pop > 1
RETURN a,pop;
Per single a
node per row, you're getting the count of persons that they follow, and filtering to only those that follow more than 1 person, but there are no such entries in the movies graph. For the persons that do have outgoing follows relationships, they only have a single follows relationship to another person.
05-27-2021 07:50 PM
I got tripped up with aggregating functions too.
My example is simpler, so it may be easier to understand what works or doesn't work and why by looking at it:
I believe the problem you are facing, is you need to have only ONE variable that is being aggregated in the WITH
statement. Then you can have one or more aggregation functions. Adding the r
broke things.
Here, I'm aggregating per Person and I can get the number of movies per person or average year.
MATCH (h:Person)-[r:ACTED_IN]->(m:Movie)
WITH h, count(m) as movies, avg(m.released) AS releaseYearAvg
RETURN h.name, movies, releaseYearAvg
All the sessions of the conference are now available online