Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-11-2022 09:48 PM
I have written the below query to retrieve movie title
and corresponding rating
:
match (rvwr:Person)-[r:REVIEWED]->(m:Movie)
where m.released > 2003
return m.title, avg(r.rating) as rating
I understand that groupby is done implicitly for m.title
.
But, how groupby works if there are more than 1 columns as below:
match (rvwr:Person)-[r:REVIEWED]->(m:Movie)
where m.released > 2003
return m.title, rvwr.born, avg(r.rating) as rating
03-12-2022 02:06 AM
The result of your query is a set of rows, with values for rvwr, r, and m. These values can repeat on several lines because a reviewer can review many movies and a movie can be reviewed by many reviewers.
When you use an aggregate function in a WITH or RETURN statement, all values not included as parameters of the aggregate function form the grouping criteria.
In your example, every combination of m.title and rvwr.born would be a group and the average would be computed over the subset of records with the same combination of m.title and rvwr.born.
To visualize this, run the two queries below. The first will output the data grouped, so you can see the values of r.rating that will be included in the averages for each group.
The second query will apply the average; thereby, reducing each group of records with the same combination of m.title and rvwr,born to one row each, with their average computed over the r.ratings shown in the first query.
match (rvwr:Person)-[r:REVIEWED]->(m:Movie)
where m.released > 2003
return m.title, rvwr.born, r.rating
order by m.title, rvwr.born
match (rvwr:Person)-[r:REVIEWED]->(m:Movie)
where m.released > 2003
return m.title, rvwr.born, avg(r.rating) as rating
order by m.title, rvwr.born
Note: what I described is explicit grouping. There is something called implicit grouping, but this feature has been deprecated. You will see warnings in neo4j browsers when it occurs. You can refactor your code to make the grouping explicit. It also makes your code more understandable.
This may help too:
03-12-2022 02:35 AM
Nice explanation, Gary Lilienfield
All the sessions of the conference are now available online