Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-06-2019 05:49 AM
In my db are some nodes which have this form:
n:Contact
{
"name": "name surname",
"created_timestampe": 1460411261,
"updated_timestamp": 1494083876,
"contact_point": [
"+39*********"
]
}
but there are some nodes where I have some values for the keys name and contact_point which are duplicated. I don't care about created_timestamp or updated_timestamp, but I want to delete this duplicates checking if the contact_point value is really the same. I wrote this, but I'm not sure if it's correct:
MATCH (c:Contact)
WITH c.contact_point AS contact_point, COLLECT(c) AS contacts
WHERE SIZE(contacts) > 1
FOREACH (n IN contacts| DETACH DELETE n);
Solved! Go to Solution.
11-12-2019 01:54 PM
I am assuming that you're looking for the combination of the nodes above that would be the union of properties based on a common name
property into one node with a distinct name
. Always an interesting question on how to combine things in a graph when a classic MERGE
statement does not quiet fit. Fortunately there's an apoc procedure apoc.refactor.mergeNodes that can help. The Cypher query below walks through the list of nodes by name and uses apoc.refactor.mergeNodes
to decide what to do with the properties. I changed the created_timestampe values to unique integers so that you can see that there's not an ordering. That doesn't seem to apply to this case, but is good to know. A test set of data is below the query and the results.
One other thought is to clean the data before as part of the load process which is usually easier to do.
// apoc.refactor.mergeNodes() to combine nodes by name and handle the node properties
// note: indexes are 0 based
MATCH (c:Contact)
WITH c.name AS name, COLLECT(c) AS contacts,
SIZE(COLLECT(c)) AS nbr_nodes
WHERE SIZE(contacts) > 1 // only want names that have more than one node
// uncomment the RETURN and delete lines below it to see the grouping
// RETURN name, contacts, SIZE(contacts)
WITH name as name, contacts as contacts, nbr_nodes AS nbr_nodes
UNWIND RANGE(1, nbr_nodes - 1) as idx
CALL apoc.refactor.mergeNodes([contacts[0], contacts[idx]], {properties: {name:'combine', created_timestampe:'overwrite', updated_timestamp:'overwrite', contact_point:'combine'}}) YIELD node
RETURN name, max(idx) + 1 AS `Nbr Nodes Merged`
OUTPUT
╒═══════╤══════════════════╕
│"name" │"Nbr Nodes Merged"│
╞═══════╪══════════════════╡
│"Mamà" │3 │
├───────┼──────────────────┤
│"Kiddo"│2 │
├───────┼──────────────────┤
│"Papà" │4 │
└───────┴──────────────────┘
MATCH(n) RETURN n
╒══════════════════════════════════════════════════════════════════════╕
│"n" │
╞══════════════════════════════════════════════════════════════════════╡
│{"name":"Mamà","created_timestampe":5,"updated_timestamp":1520798382,"│
│contact_point":"+39*********332"} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Kiddo","created_timestampe":9,"updated_timestamp":1520798382,│
│"contact_point":["1234567","abcdefg"]} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Papà","created_timestampe":3,"updated_timestamp":1520798382,"│
│contact_point":["plus-minus-plus-minus","+39********009","+39********0│
│0X","39*********332","+39*********332"]} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"The Dag","created_timestampe":10,"updated_timestamp":15207983│
│82,"contact_point":["food dish"]} │
└──────────────────────────────────────────────────────────────────────┘
// Test Data
CREATE (:Contact {
name: "Papà",
created_timestampe: 1,
updated_timestamp: 1493240714,
contact_point: [
"+39********009",
"+39********00X"
]
} ),
(:Contact {
name: "Papà",
created_timestampe: 2,
updated_timestamp: 1493240714,
contact_point: [
"+39********009",
"39*********332"
]
}),
(:Contact {
name: "Papà",
created_timestampe: 3,
updated_timestamp: 1520798382,
contact_point: [
"+39*********332"
]
}),
(:Contact {
name: "Papà",
created_timestampe: 4,
updated_timestamp: 1520798382,
contact_point: [
"plus-minus-plus-minus"
]
}),
(:Contact {
name: "Mamà",
created_timestampe: 5,
updated_timestamp: 1520798382,
contact_point: [
"+39*********332"
]
}),
(:Contact {
name: "Mamà",
created_timestampe: 6,
updated_timestamp: 1520798382,
contact_point: [
"+39*********332"
]
}),
(:Contact {
name: "Mamà",
created_timestampe: 7,
updated_timestamp: 1520798382,
contact_point: [
"+39*********332"
]
}),
(:Contact {
name: "Kiddo",
created_timestampe: 8,
updated_timestamp: 1520798382,
contact_point: [
"1234567"
]
}),
(:Contact {
name: "Kiddo",
created_timestampe: 9,
updated_timestamp: 1520798382,
contact_point: [
"abcdefg"
]
}),
(:Contact {
name: "The Dag",
created_timestampe: 10,
updated_timestamp: 1520798382,
contact_point: [
"food dish"
]
})
11-06-2019 07:32 AM
That looks pretty close, but right now it will delete all of the nodes when presumably you want to delete all but one of them?
The code below does what you want:
MATCH (c:Contact)
WITH c.contact_point AS contact_point, COLLECT(c) AS contacts
WHERE SIZE(contacts) > 1
UNWIND contacts[1..] AS contact
DETACH DELETE contact
11-06-2019 07:41 AM
Yes, In fact the code that I wrote deleted all the nodes where size(contact)=1. The code that you write works great, but after I execute it i have a problem. In fact, for example, I have this 3 node
{
"name": "Papà",
"created_timestampe": 1442311441,
"updated_timestamp": 1493240714,
"contact_point": [
"+39********009"
]
}
{
"name": "Papà",
"created_timestampe": 1442194672,
"updated_timestamp": 1493240714,
"contact_point": [
"+39********009",
"39*********332"
]
}
{
"name": "Papà",
"created_timestampe": 1515581146,
"updated_timestamp": 1520798382,
"contact_point": [
"+39*********332"
]
}
If I want to delete also the first node and the third node, because the second one contains both, how i should do ?
11-12-2019 01:54 PM
I am assuming that you're looking for the combination of the nodes above that would be the union of properties based on a common name
property into one node with a distinct name
. Always an interesting question on how to combine things in a graph when a classic MERGE
statement does not quiet fit. Fortunately there's an apoc procedure apoc.refactor.mergeNodes that can help. The Cypher query below walks through the list of nodes by name and uses apoc.refactor.mergeNodes
to decide what to do with the properties. I changed the created_timestampe values to unique integers so that you can see that there's not an ordering. That doesn't seem to apply to this case, but is good to know. A test set of data is below the query and the results.
One other thought is to clean the data before as part of the load process which is usually easier to do.
// apoc.refactor.mergeNodes() to combine nodes by name and handle the node properties
// note: indexes are 0 based
MATCH (c:Contact)
WITH c.name AS name, COLLECT(c) AS contacts,
SIZE(COLLECT(c)) AS nbr_nodes
WHERE SIZE(contacts) > 1 // only want names that have more than one node
// uncomment the RETURN and delete lines below it to see the grouping
// RETURN name, contacts, SIZE(contacts)
WITH name as name, contacts as contacts, nbr_nodes AS nbr_nodes
UNWIND RANGE(1, nbr_nodes - 1) as idx
CALL apoc.refactor.mergeNodes([contacts[0], contacts[idx]], {properties: {name:'combine', created_timestampe:'overwrite', updated_timestamp:'overwrite', contact_point:'combine'}}) YIELD node
RETURN name, max(idx) + 1 AS `Nbr Nodes Merged`
OUTPUT
╒═══════╤══════════════════╕
│"name" │"Nbr Nodes Merged"│
╞═══════╪══════════════════╡
│"Mamà" │3 │
├───────┼──────────────────┤
│"Kiddo"│2 │
├───────┼──────────────────┤
│"Papà" │4 │
└───────┴──────────────────┘
MATCH(n) RETURN n
╒══════════════════════════════════════════════════════════════════════╕
│"n" │
╞══════════════════════════════════════════════════════════════════════╡
│{"name":"Mamà","created_timestampe":5,"updated_timestamp":1520798382,"│
│contact_point":"+39*********332"} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Kiddo","created_timestampe":9,"updated_timestamp":1520798382,│
│"contact_point":["1234567","abcdefg"]} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Papà","created_timestampe":3,"updated_timestamp":1520798382,"│
│contact_point":["plus-minus-plus-minus","+39********009","+39********0│
│0X","39*********332","+39*********332"]} │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"The Dag","created_timestampe":10,"updated_timestamp":15207983│
│82,"contact_point":["food dish"]} │
└──────────────────────────────────────────────────────────────────────┘
// Test Data
CREATE (:Contact {
name: "Papà",
created_timestampe: 1,
updated_timestamp: 1493240714,
contact_point: [
"+39********009",
"+39********00X"
]
} ),
(:Contact {
name: "Papà",
created_timestampe: 2,
updated_timestamp: 1493240714,
contact_point: [
"+39********009",
"39*********332"
]
}),
(:Contact {
name: "Papà",
created_timestampe: 3,
updated_timestamp: 1520798382,
contact_point: [
"+39*********332"
]
}),
(:Contact {
name: "Papà",
created_timestampe: 4,
updated_timestamp: 1520798382,
contact_point: [
"plus-minus-plus-minus"
]
}),
(:Contact {
name: "Mamà",
created_timestampe: 5,
updated_timestamp: 1520798382,
contact_point: [
"+39*********332"
]
}),
(:Contact {
name: "Mamà",
created_timestampe: 6,
updated_timestamp: 1520798382,
contact_point: [
"+39*********332"
]
}),
(:Contact {
name: "Mamà",
created_timestampe: 7,
updated_timestamp: 1520798382,
contact_point: [
"+39*********332"
]
}),
(:Contact {
name: "Kiddo",
created_timestampe: 8,
updated_timestamp: 1520798382,
contact_point: [
"1234567"
]
}),
(:Contact {
name: "Kiddo",
created_timestampe: 9,
updated_timestamp: 1520798382,
contact_point: [
"abcdefg"
]
}),
(:Contact {
name: "The Dag",
created_timestampe: 10,
updated_timestamp: 1520798382,
contact_point: [
"food dish"
]
})
11-16-2019 01:13 PM
@dan.flavin1 yes, my idea is to combine the nodes which have the same name. Thanks a lot ! It's works perfectly !
All the sessions of the conference are now available online