Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-23-2020 08:41 AM
Hey,
an inconsistency I've encountered while doing exercise 5 on top of the movies db.
first query:
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
WITH count(a) as numMovies, collect(m.title) as movies,a
WHERE numMovies = 5
RETURN a.name, movies
resulting in a table with 3 rows.
second query:
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
WITH count(a) as numMovies, m, a
WHERE numMovies = 5
RETURN a.name, collect(m.title) as movies
returning no records.
the only difference is that I've declared the collect(m.title) in the with and not in the return.
There may be a convention in node4j that I'm not familiar with that may explain this but I would love your help to understand this situation.
thanks in advance!
03-23-2020 10:32 AM
Hi there,
Cypher doesn't have a group by like SQL. When you use aggregating functions, anything in your return clause that is not aggregated becomes part of the grouping.
In your second query WITH clause, you included both m and a. That means that you are grouping by both the move and the person. Since there is only one ACTED_IN relationship connecting an actor to an individual movie, numMovies is always equal to 1. You can run the following query to see why your WHERE clause is filtering out all of the data.
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
RETURN count(a) as numMovies, m, a
ORDER BY a.name
Best wishes,
Nathan
04-15-2020 03:31 AM
Hi,
I am trying to understand the difference of
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(a) AS numMovies, collect(m.title) AS movies
WHERE numMovies = 5
RETURN a.name, movies
vs
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(m) AS numMovies, collect(m.title) AS movies //notice the count(m)
WHERE numMovies = 5
RETURN a.name, movies
Both queries return the same result.
For me it's more intuitive to say count(m) to find number of movies than count(a).
But the tutorials use count(a).
Why is count(a) a better practice than count(m)?
Thanks.
04-15-2020 07:15 AM
In this pattern
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
Since it is a simple single step count(a) or count(m) both will give the distinct
(a)-->(m) paths. So either one will give same result. Once you have longer pattern things can get different.
04-22-2020 06:50 PM
count(a) is not better practice than count(m) here, you're right.
Given the context provided by the alias being used here (numMovies), the query should be favoring count(m)
. The results may not change, but it's important I think to align the context with what you're doing, especially if we have to alter the query to count something related to movies, or to count(DISTINCT m). If it was left as a
, then such an alternation may overlook the fact that we're aggregating on the wrong variable.
04-15-2020 07:09 AM
Hi Alex,
You are correct that count(a), count(m), and count(*) will return the same result for this query. As far as I know, one is not better than the other.
You might find this article helpful if you need very fast counts. https://neo4j.com/developer/kb/fast-counts-using-the-count-store/ The examples are from the movie database.
Best wishes,
Nathan
All the sessions of the conference are now available online