Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-16-2020 03:24 PM
NEO4j 4.0.5 in both Desktop and on a standalone Enterprise single instance in an EC2 VM.
When I run the following query with the SKIP and LIMIT in them I get some duplicates on subsequent pages. (groups of 50).
call{
MATCH (u:User {user_id:218})
OPTIONAL MATCH (u)-[:USER]-(mf:Image)
with u, collect(mf.version_history) as myInts
WITH u, REDUCE(output = [], i IN myInts| output + i) AS my_files
MATCH (u)-[:FAMILY_FRIEND]-(:User)-[ul:USER]-(f:Image)
with collect({node: f, deleted: exists(ul.deleted), vh: f.version_history, vd: f.version_date}) as files, my_files
unwind files as fset
with fset['node'] as f, fset['deleted'] as deleted, fset['vh'] as vh, fset['vd'] as vd, files, my_files
WHERE none(x in files WHERE apoc.coll.intersection(vh, x['vh']) and x['vd'] > vd)
AND not apoc.coll.intersection(vh, my_files) AND not deleted
return f
union
MATCH (:User {user_id:218})-[ul:USER]-(f:Image)
WHERE not exists(ul.deleted)
return f
}
WITH f
ORDER BY f.version_date DESC
SKIP 0 // or 50
LIMIT 50
return f.uid
I tried adding a distinct f.
My understanding is that the new call subquery would remove the issue of skip and limit not working on both parts of the UNION
If I remove the SKIP and LIMIT and return all the results there are no duplicates returned.
The duplicates are not necessarily the end node and first node of the set (It's not an off by 1 issue), It seems to be 1 or mode nodes on each page that are duplicated.
Any ideas?
Solved! Go to Solution.
06-19-2020 10:17 AM
The timestamps of some of the items are the same (They must have been changed as part of the same TXN when they were in Postgres and thus have the same timestamp.) Adding a second order by clause removes the duplicates on the subsequent pages. (Still seems like strange behavior, but at least I know why now.
06-18-2020 08:45 AM
This is holding me up from getting my software working, is there more information I can give to someone to help with this? Is this a bug in 4.0.5?
When I remove the skip and limit it does not return duplicates, but returning thousands of results at a time will not work, they need to be paged, what can I do?
06-19-2020 03:03 AM
Hi Mike,
As union gives non duplicate results you do not need distinct keyword.
However I feel like duplicate records are due to variable name f. In the return it is not hitting combined f.
I am surprised how come SKIP and LIMIT is giving you duplicate results when f itself if distinct result set.
Once you come online we can discuss on this in more detail.
06-19-2020 09:31 AM
So I did a quick test this morning and this works as it should:
call {
UNWIND [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] as data1
return data1
UNION
UNWIND [7,8,9,17,18,19] as data1
return data1
}
WITH data1
order by data1
skip 10
limit 10
return data1
06-19-2020 09:38 AM
So, I removed all the logic in the first part of my union to try and eliminate where the problem was coming from, but this still has duplicates:
call{
MATCH (u:User {user_id:'218'})
OPTIONAL MATCH (u)-[:USER]-(f:Image)
return f
union
MATCH (:User {user_id:'218'})-[ul:USER]-(f:Image)
WHERE not exists(ul.deleted)
return f
}
WITH f
ORDER BY f.version_date DESC
SKIP 50 // or 50
LIMIT 50
return f.uid
06-19-2020 09:45 AM
Cypher Manual 3.18.1 -- https://neo4j.com/docs/cypher-manual/4.0/clauses/union/
To keep all the result rows, use
UNION ALL
. Using justUNION
will combine and remove duplicates from the result set.
06-19-2020 09:56 AM
Yeah I reread the doc, that's why I deleted my post
06-19-2020 10:02 AM
So I have made the dumbest possibly query (Which you would never do in real life, but to test the issue It was useful.) This returns duplicates. Once again, the duplicates seem random.
call{
MATCH(f:Image)
return f
union
MATCH (f:Image)
return f
}
WITH f
ORDER BY f.version_date DESC
SKIP $skip
LIMIT $limit
RETRUN f.uid
06-19-2020 10:06 AM
Here is one of the items returning duplicates:
Results from: MATCH (f:File{uid:'4147'}) RETURN f
{
"identity": 7247,
"labels": [
"File",
"Image"
],
"properties": {
"has_people": true,
"version_date": "2020-02-06T18:38:00.740160999Z",
"version_history": [
4147
],
"file_size": 0,
"uid": "4147",
"system": "facebook",
"filename": "10155585636624547",
"width": 721,
"thumbnail_system": "wasabi-us-east-2",
"thumbnail_key": "6ced12aa-8b93-44b4-a1e8-42f4ef008cc9.jpg",
"hash": "ee6137545d2e4902e4124a59a5471b75e91c1925a37c0459decbf85886083c71",
"upload_date": "2019-12-18T19:56:46.047438000Z",
"height": 960
}
}
06-19-2020 10:07 AM
So on a whim I changed the order by from the version_date to the uid and suddenly no duplicates.
06-19-2020 10:17 AM
The timestamps of some of the items are the same (They must have been changed as part of the same TXN when they were in Postgres and thus have the same timestamp.) Adding a second order by clause removes the duplicates on the subsequent pages. (Still seems like strange behavior, but at least I know why now.
06-19-2020 10:21 AM
As a followup I reran the query without the union or anything and the duplicates came up, the problem is only related to the dates.
MATCH(f:Image)
WITH f
ORDER BY f.version_date DESC//, f.uid DESC
SKIP $skip
LIMIT $limit
RETURN f.uid, f.version_date
All the sessions of the conference are now available online