Neo4j

Reuben · ‎01-19-2023

My cypher query is not able to detect duplicates / repeated nodes under different labels. The output it gives me is –> (no changes, no records)

MATCH (n)
WHERE n.name = "Joining"
WITH n, COUNT(n) as count
WHERE count > 1
RETURN n

However, when search for the same node as an individual entity using this query below gives me all the available duplicates under the different labels.

MATCH (n)
WHERE n.name = "Joining"
// WITH n, COUNT(n) as count
// WHERE count > 1
RETURN n

Please can anyone explain why and suggest how best to go about it? Thanks

#Neo4J #Cypher #nodeduplicates

@glilienfield

glilienfield · ‎01-19-2023

Simple mistake, you are grouping by 'n', which is the node. The result will be a separate row for each n and a corresponding count of one.

What you want to do is group on the common value, which is n.name.

MATCH (n)
WHERE n.name = "Joining"
WITH n.name as name, COLLECT(n) as duplicates
WHERE size(duplicates) > 1
RETURN duplicates

View solution in original post

glilienfield · ‎01-19-2023

You know you are correct, the 'with c.name' as I did is not necessary since the value is the same for all nodes. As such, it can be removed, which is the result you gave.

View solution in original post

glilienfield · ‎01-19-2023

You need to group the nodes by their duplicate values. Assume that you define two nodes as duplicates if their duplicateProperty1 and duplicateProperty2 values are equal. For example, two Person nodes are duplicate if they have the same email address and phone numbers, something like that. Using properties duplicateProperty1 and duplicateProperty2 as the values that define a duplicate, the following query will group the nodes that have the same values and return their node ids in a list. You can return what you need instead.

match(n)
with n.duplicateProperty1 as property1, n.duplicateProperty2 as property2, collect(id(n)) as ids
where size(ids) > 1
return property1, property2, ids

Test data:

Result of grouping by both properties:

View solution in original post

glilienfield · ‎01-19-2023

Simple mistake, you are grouping by 'n', which is the node. The result will be a separate row for each n and a corresponding count of one.

What you want to do is group on the common value, which is n.name.

MATCH (n)
WHERE n.name = "Joining"
WITH n.name as name, COLLECT(n) as duplicates
WHERE size(duplicates) > 1
RETURN duplicates

Reuben · ‎01-19-2023

using the collect approach worked:

MATCH (n)
WHERE n.name = "specific name"
WITH COLLECT(n) as nodes
WHERE SIZE(nodes) > 1
RETURN nodes

glilienfield · ‎01-19-2023

You know you are correct, the 'with c.name' as I did is not necessary since the value is the same for all nodes. As such, it can be removed, which is the result you gave.

Reuben · ‎01-19-2023

Thanks so much, you are a life saver. how about a general scenario to check duplicates as I illustrated below?

Reuben · ‎01-19-2023

Is there a way to look for duplicate nodes in general? Something like this?

// this doesn't work though*
MATCH (n)
WITH n, COUNT(n) as count
WHERE count > 1
WITH COLLECT(n) as nodes
RETURN nodes

glilienfield · ‎01-19-2023

You need to group the nodes by their duplicate values. Assume that you define two nodes as duplicates if their duplicateProperty1 and duplicateProperty2 values are equal. For example, two Person nodes are duplicate if they have the same email address and phone numbers, something like that. Using properties duplicateProperty1 and duplicateProperty2 as the values that define a duplicate, the following query will group the nodes that have the same values and return their node ids in a list. You can return what you need instead.

match(n)
with n.duplicateProperty1 as property1, n.duplicateProperty2 as property2, collect(id(n)) as ids
where size(ids) > 1
return property1, property2, ids

Test data:

Result of grouping by both properties:

Reuben · ‎01-19-2023

Thank you as usual @glilienfield

Neo4j

Not detecting repeated nodes