Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-06-2023 02:57 PM
I'm currently trying to use the apoc.nodes.collapse function to perform entity resolution during a query (I'm doing it this way to simplify undoing merges in the future).
I'm really struggling to understand how the behaviour works - not much is explained in the documentation
(I'm currently using apoc version 5.4.1)
Here is my example code:
This creates my example linked graph
CREATE (luke:Person {name: "Luke", groupid: 1, uid: "123a", class: 1}),
(lukes:Person {name: "Luke S", groupid: 1, uid: "099u", class: 2}),
(lukesky:Person {name: "Luke Sky", groupid: 1, uid: "771p", class: 3}),
(lukey:Person {name: "Lukey", groupid: 1, uid: "231z", class: 2}),
(paddy:Person {name: "Paddy", uid: "375y", class: 1}),
(jacob:Person {name: "jacob", uid: "983w", class: 1}),
(skywalker:Person {name: "L Skywalker", groupid: 1, uid: "456b", class: 2}),
(mrsky:Person {name: "Mr Skywalker", groupid: 1, uid: "563x", class: 2}),
(yoda:Person {name: "Yoda", uid: "12yoda", class: 2}),
(obiwan:Person {name: "Obi-Wan", groupid: 2, uid: "34obi", class: 3}),
(kenobi:Person {name: "Kenobi", groupid: 2, uid: "56obi", class: 3}),
(luke)-[:POSSIBLE_LINK {uid: "789c", class: 3}]->(skywalker),
(obiwan)-[:POSSIBLE_LINK {uid: "121c", class: 3}]->(kenobi),
(luke)-[:TRAINED_BY {uid: "983i", class: 3}]->(yoda),
(skywalker)-[:POSSIBLE_LINK {uid: "922n", class: 3}]->(lukes),
(lukes)-[:POSSIBLE_LINK {uid: "205q", class: 3}]->(lukesky),
(luke)-[:POSSIBLE_LINK {uid: "183g", class: 2}]->(lukey),
(mrsky)-[:POSSIBLE_LINK {uid: "740l", class: 2}]->(luke),
(paddy)-[:MET_WITH {uid: "288u", class: 2}]->(jacob),
(mrsky)-[:MET_WITH {uid: "377s", class: 2}]->(obiwan)
This is an example query to merge the nodes that have a "POSSIBLE_LINK" relationship
MATCH (n)-[r: POSSIBLE_LINK*]->(m)
WITH n, r, m
WHERE NOT (n)<-[: POSSIBLE_LINK]-()
WITH collect(m) AS linkednodes, n as n2
WITH linkednodes + [n2] AS nodeset
CALL apoc.nodes.collapse(nodeset, {properties:'combine', mergeVirtualRels:true}) yield from, rel, to
RETURN from, rel, to
Which returns:
So the Linked Luke Skywalker nodes have been successfully merged/collapsed.. however it also returns an unmerged version instance of one of the nodes "Mr Skywalker" - why is this?
I can only assume its because this node has a separate relationship but I cant see where this is explained in the documentation.
Is there a way to circumvent this behaviour?
Alternatively is there a better way to do entity resolution, whilst allowing a non overly-complex way to undo merges at a later date?
All the sessions of the conference are now available online