Neo4j

aerfan · ‎05-17-2022

I'm using neo4j version 4.4.6 and I have a cypher query which return nodes with relations and using the pagination but when I add ORDER BY the query return dublicated nodes in some pages
for example the last node in page 2 is the same first node in page 3
My query

MATCH (b:Business)-[r:PURCHASE]->(photo:Photo)<-[:UPLOADS]-(p:Photographer)
WHERE photo.deleted_at IS null
WITH COLLECT(DISTINCT p) AS top, COUNT(DISTINCT p) AS total, COUNT(DISTINCT r) AS r
UNWIND top as p
match (p)-[:UPLOADS]->(uphoto:Photo)
WHERE  uphoto.deleted_at is null
with count(r) as r_count, p, collect(DISTINCT uphoto)[..4] as photos, total
RETURN r_count, photos, p, total
ORDER BY r_count DESC
SKIP 0 LIMIT 3

So this will return dublicated p nodes
And to prevent dublication I added another value in ordering to be like this
ORDER BY r_count DESC, p.nodeId DESC
Is this correct or there is another solution?

glilienfield · ‎05-18-2022

Is this the entire query, or a snippet? Reviewing the logic, I don't feel it should be returning duplicate photographers. In review, the first 'with' clause consists of all aggregate functions, so it will collect those attributes across all rows and return one row as the result. Since you are collecting distinct photographers, the 'top' list will have the photographers that uploaded photos that were purchased by a business, without duplicates.

Following your unwind of top, the result is a row for each distinct photographer, with the values of 'total' and 'r' repeated for each photographer.

Following the 'match' on photographers that uploaded a photo, the rows will expend. All the photographers will still be in the result, since each of the photographers in top is known to have uploaded photos. The results of all the matches will have the value of 'r' and 'total' appended to the result of each row. You now have rows with a photographer, an uploaded photo, total, and r.

The result of the 'with' clause is to group the rows by photographer and total, since those attributes are not part of an aggregate expression. Since the 'total' value is the same for all rows, you are effectively grouping on each photographer. The result is a row for each photographer, the count of the number of rows for that photographer, the reduced collection of their photos, and the same value of total.

From this, it seems that the results should be one row for each photographer. This is even regardless of you use of 'distinct p', as the groping by 'p' and 'total' in the 'with' clause will remove any duplicate 'p' values since the other grouping attribute 'total' is the same for all 'p' values. You would get duplicate photos though, so you count_r would have been inflated.

Interesting, the value of 'r_count' that is calculated ends up to nothing to do with 'r' from the original match. The value of 'r_count' is equal to the number of photographs the photographer uploaded, which was determined in the second match. Is this what you intend? If so, the query could be simplified. You could remove the definition of 'r' in the first 'with' clause, and replace 'count(r) as r_count' with 'count(*) as r_count'.

Does this sound logical, or have I misinterpreted something?

Neo4j

Ordering in neo4j query make duplication in nodes while using pagination