Neo4j

lingvisa · ‎03-26-2021

MATCH (n:Product)
MATCH (m:Product)
WHERE n.name = m.name AND NOT n.id = m.id
DETACH DELETE n

I have nodes with same names but created with different custom ids, which shouldn't happen. I want to find those nodes and delete one of the two nodes. Each such pair of nodes should be the same and I only want to keep one of them. My above command deletes both, because the m, n can occur in either ends. Is there a way to modify this and only delete one?

koji · ‎03-26-2021

@lingvisa

This is the data.

CREATE (:Product {id:0, name: 'A'}),
       (:Product {id:1, name: 'A'}),
       (:Product {id:2, name: 'B'}),
       (:Product {id:3, name: 'B'}),
       (:Product {id:4, name: 'C'});

This Cypher will erase all IDs except the first one.
This code may not be elegant, but it will work correctly.

MATCH (n:Product)
MATCH (m:Product)
  WHERE n.name = m.name
  AND NOT n.id = m.id
WITH n.name AS name, collect(n.id)[0] AS firstNodeId
MATCH (n:Product)
  WHERE n.name = name
  AND n.id <> firstNodeId
DETACH DELETE n;

tard_gabriel · ‎03-26-2021

NOT TESTED

First we retrieve all the distinct names in the database

MATCH (n)
WITH DISTINCT n.name AS name

Second we match each group of nodes corresponding to a name who has duplicates and we delete these nodes.

MATCH (n {name:name}) WHERE count(n) > 1
WITH n SKIP 1
DETACH DELETE n

These two statements must be part of the query when you paste them in your Neo4j Desktop. A Neo4j APOC function certainly exists for that purpose, these are generally much more short and efficient but less human friendly to write and read.

lingvisa · ‎03-27-2021

@tard.gabriel I tried, but seems not working:

MATCH (n)
WITH DISTINCT n.name AS Name
MATCH (n {name:Name}) WHERE count(n) > 1
WITH n SKIP 1
DETACH DELETE n

Invalid use of aggregating function count(...) in this context (line 3, column 29 (offset: 67))
"MATCH (n {name:Name}) WHERE count(n) > 1"

andy_hegedus · ‎03-27-2021

Just to chime in.

Could the original cypher query be tweaked a bit

to test the order of the id which would prevent both combinations from being true:

MATCH (n:Product)
MATCH (m:Product)
WHERE n.name = m.name AND n.id > m.id
DETACH DELETE n

Andy

tard_gabriel · ‎03-27-2021

TESTED

It's the most short, sweet and pretty solution I could come up with.
I think the DISTINCT operator is optional in this case but not sure.

MATCH (n)
WITH DISTINCT n.name AS name, collect(n) AS nodes
FOREACH (n IN tail(nodes) | DETACH DELETE n )

Keep in mind that the best solution is always to avoid creating duplicates by using the constraints before importing or creating any data.

By the way, message intended for the APOC developers, would be great to have a function to remove duplicates based on a node or relationship value an not only the whole thing.

*tail means taking every element in a list except the first one
*collect create a list from all the matching nodes in this case

If you have enjoyed this solution, please check the solution box, this would help me to provide more solutions in the future.

Neo4j

Is there a way to remove only one of the two nodes?