Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-19-2023 08:48 PM - edited 01-19-2023 08:49 PM
My cypher query is not able to detect duplicates / repeated nodes under different labels. The output it gives me is –> (no changes, no records)
MATCH (n)
WHERE n.name = "Joining"
WITH n, COUNT(n) as count
WHERE count > 1
RETURN n
However, when search for the same node as an individual entity using this query below gives me all the available duplicates under the different labels.
MATCH (n)
WHERE n.name = "Joining"
// WITH n, COUNT(n) as count
// WHERE count > 1
RETURN n
Please can anyone explain why and suggest how best to go about it? Thanks
#Neo4J #Cypher #nodeduplicates
Solved! Go to Solution.
01-19-2023 08:57 PM
Simple mistake, you are grouping by 'n', which is the node. The result will be a separate row for each n and a corresponding count of one.
What you want to do is group on the common value, which is n.name.
MATCH (n)
WHERE n.name = "Joining"
WITH n.name as name, COLLECT(n) as duplicates
WHERE size(duplicates) > 1
RETURN duplicates
01-19-2023 09:02 PM
You know you are correct, the 'with c.name' as I did is not necessary since the value is the same for all nodes. As such, it can be removed, which is the result you gave.
01-19-2023 10:02 PM - edited 01-19-2023 10:04 PM
You need to group the nodes by their duplicate values. Assume that you define two nodes as duplicates if their duplicateProperty1 and duplicateProperty2 values are equal. For example, two Person nodes are duplicate if they have the same email address and phone numbers, something like that. Using properties duplicateProperty1 and duplicateProperty2 as the values that define a duplicate, the following query will group the nodes that have the same values and return their node ids in a list. You can return what you need instead.
match(n)
with n.duplicateProperty1 as property1, n.duplicateProperty2 as property2, collect(id(n)) as ids
where size(ids) > 1
return property1, property2, ids
Test data:
Result of grouping by both properties:
01-19-2023 08:57 PM
Simple mistake, you are grouping by 'n', which is the node. The result will be a separate row for each n and a corresponding count of one.
What you want to do is group on the common value, which is n.name.
MATCH (n)
WHERE n.name = "Joining"
WITH n.name as name, COLLECT(n) as duplicates
WHERE size(duplicates) > 1
RETURN duplicates
01-19-2023 09:00 PM
using the collect approach worked:
MATCH (n)
WHERE n.name = "specific name"
WITH COLLECT(n) as nodes
WHERE SIZE(nodes) > 1
RETURN nodes
01-19-2023 09:02 PM
You know you are correct, the 'with c.name' as I did is not necessary since the value is the same for all nodes. As such, it can be removed, which is the result you gave.
01-19-2023 09:34 PM
Thanks so much, you are a life saver. how about a general scenario to check duplicates as I illustrated below?
01-19-2023 09:15 PM
Is there a way to look for duplicate nodes in general? Something like this?
// this doesn't work though*
MATCH (n)
WITH n, COUNT(n) as count
WHERE count > 1
WITH COLLECT(n) as nodes
RETURN nodes
01-19-2023 10:02 PM - edited 01-19-2023 10:04 PM
You need to group the nodes by their duplicate values. Assume that you define two nodes as duplicates if their duplicateProperty1 and duplicateProperty2 values are equal. For example, two Person nodes are duplicate if they have the same email address and phone numbers, something like that. Using properties duplicateProperty1 and duplicateProperty2 as the values that define a duplicate, the following query will group the nodes that have the same values and return their node ids in a list. You can return what you need instead.
match(n)
with n.duplicateProperty1 as property1, n.duplicateProperty2 as property2, collect(id(n)) as ids
where size(ids) > 1
return property1, property2, ids
Test data:
Result of grouping by both properties:
All the sessions of the conference are now available online