cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How does the behaviour of apoc.nodes.collapse work?

I'm currently trying to use the apoc.nodes.collapse function to perform entity resolution during a query (I'm doing it this way to simplify undoing merges in the future).

I'm really struggling to understand how the behaviour works - not much is explained in the documentation

(I'm currently using apoc version 5.4.1)

Here is my example code:

This creates my example linked graph

CREATE (luke:Person {name: "Luke", groupid: 1, uid: "123a", class: 1}),
		(lukes:Person {name: "Luke S", groupid: 1, uid: "099u", class: 2}),
		(lukesky:Person {name: "Luke Sky", groupid: 1, uid: "771p", class: 3}),
		(lukey:Person {name: "Lukey", groupid: 1, uid: "231z", class: 2}),
		(paddy:Person {name: "Paddy", uid: "375y", class: 1}),
		(jacob:Person {name: "jacob", uid: "983w", class: 1}),
		(skywalker:Person {name: "L Skywalker", groupid: 1, uid: "456b", class: 2}),
		(mrsky:Person {name: "Mr Skywalker", groupid: 1, uid: "563x", class: 2}),
		(yoda:Person {name: "Yoda", uid: "12yoda", class: 2}),
		(obiwan:Person {name: "Obi-Wan", groupid: 2, uid: "34obi", class: 3}),
		(kenobi:Person {name: "Kenobi", groupid: 2, uid: "56obi", class: 3}),
		(luke)-[:POSSIBLE_LINK {uid: "789c", class: 3}]->(skywalker),
		(obiwan)-[:POSSIBLE_LINK {uid: "121c", class: 3}]->(kenobi),
		(luke)-[:TRAINED_BY {uid: "983i", class: 3}]->(yoda),
		(skywalker)-[:POSSIBLE_LINK {uid: "922n", class: 3}]->(lukes),
		(lukes)-[:POSSIBLE_LINK {uid: "205q", class: 3}]->(lukesky),
		(luke)-[:POSSIBLE_LINK {uid: "183g", class: 2}]->(lukey),
		(mrsky)-[:POSSIBLE_LINK {uid: "740l", class: 2}]->(luke),
		(paddy)-[:MET_WITH {uid: "288u", class: 2}]->(jacob),
		(mrsky)-[:MET_WITH {uid: "377s", class: 2}]->(obiwan)

premerge.png

This is an example query to merge the nodes that have a "POSSIBLE_LINK" relationship

MATCH (n)-[r: POSSIBLE_LINK*]->(m)
WITH n, r, m
WHERE NOT (n)<-[: POSSIBLE_LINK]-()
WITH collect(m) AS linkednodes, n as n2
WITH linkednodes + [n2] AS nodeset
CALL apoc.nodes.collapse(nodeset, {properties:'combine', mergeVirtualRels:true}) yield from, rel, to
RETURN from, rel, to

Which returns:

psotmerge.png

 

 

So the Linked Luke Skywalker nodes have been successfully merged/collapsed.. however it also returns an unmerged version instance of one of the nodes "Mr Skywalker" - why is this?

I can only assume its because this node has a separate relationship but I cant see where this is explained in the documentation.

Is there a way to circumvent this behaviour?

Alternatively is there a better way to do entity resolution, whilst allowing a non overly-complex way to undo merges at a later date?

0 REPLIES 0
Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online