cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Merge nodes without deleting original

mkretsch
Node Clone

As we work across a multitude of data sources, we are constantly merging duplicate nodes. However, it's very easy to "over merge" or merge incorrectly and regret it after the merge has already taken place. To avoid this, we are hoping to create the following model:

(a1)-[r]->(b)
(a2)-[s]->(c)

We want to merge a1 and a2, plus all their relationships, but keep a1 and a2

(b)<-[r]-(aMerge)-[s]->(c)
(a1:Merged)-[MERGED]->(aMerge)
(a2:Merged)-[MERGED]->(aMerge)
(a1:Merged)-[r]->(b)
(a2:Merged)-[s]->(c)

This is such that if aMerge represents an incorrect merge, we simply delete aMerge and have the original relationships and nodes. From our visualization software, tagging a1 and a2 with the Merged type will eliminate them from the screen, so it won't clutter things.

I've been using APOC to merge nodes in the past
MATCH (a1), (a2)
WHERE id(a1) = {{"Source":node}} AND id(a2) = {{"Target":node}}
WITH head(collect([a1,a2])) as nodes
CALL apoc.refactor.mergeNodes(nodes,{
properties: "combine",
mergeRels:true
})
YIELD node
RETURN node

I'm having troubles using the apoc clone option because I have a unique constraint on a uuid, so it won't clone and replicate the uuid.

How can I either clone, but ignore the uuid or merge in apoc without deleting the original node? Any solutions to this? Thanks in advance!

2 REPLIES 2

@mkretsch
Currently APOC mergeNodes doesn't provide a way not to delete original entities.
I suggest you to write an issue on Issues · neo4j-contrib/neo4j-apoc-procedures · GitHub for this.

BTW, there is a workaround, using the Nodes collapse - APOC Documentation that is the equivalent of mergeNodes but use virtual entities (Virtual Nodes/Rels - APOC Documentation).

With a small dataset like this:

CREATE (a1:Yota {uuid: 1})-[r:AAA]->(b:Other),  (a1)-[:EEE]->(:Foo),
(a2:Yota {uuid: 2})-[s:EEE]->(c:Baz)

you could do:

MATCH (n:Yota) 
WITH collect(n) as nodes
CALL apoc.nodes.collapse(nodes,{  // create virtual merge node, mantains the original nodes
properties: "combine",
mergeRels:true, 
countMerge: false
})
YIELD from, rel, to
WITH from, rel, to, nodes
UNWIND nodes as nodeOriginal
CALL apoc.merge.node(apoc.node.labels(from), apoc.any.properties(from)) yield node  // convert merge node to real
WITH node, rel, to, nodeOriginal
CALL apoc.merge.relationship(node, 'MERGED_NODE', {}, {}, nodeOriginal) YIELD rel as relCreated // create relationship from new merged node and original node
RETURN *

mkretsch
Node Clone

This was very helpful, and got me most of the way there. The issue is that the merged node does seem to have the relationships. So I have (a1:Yota {uuid: 1})-[r:AAA]->(b:Other), (a1)-[:EEE]->(:Foo),
(a2:Yota {uuid: 2})-[s:EEE]->(c:Baz)

But the merged node is only has the following relationships
(mergeda1)-[MERGED_NODE]->(a1:Yota {uuid: 1})
(mergeda1)-[MERGED_NODE]->(a2:Yota {uuid: 2})

And the merged node does not connect to (c:Baz) or (b:Other) or (d:Foo)

Any recommendations for changes so I have a merged node with the merged relationships as well?