Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-05-2021 11:02 PM
Hello - I have a set of nodes where multiple values within a property have been delimited by a semicolon. These nodes have relationships to other nodes already established. I would like to split the node up based on the delimiter and preserve the existing relationships. Clearly this should have been resolved prior to data load, however, there are multiple fields throughout my data set with the same issue and I'm keen on reconciling this.
Here's an example of the data with fields related to countries. This is stored in the label :Country with a property of "country".
"qa; us; ca; fr"
"kw; us"
I would like to split this to individual country codes so I've written this in Cypher:
MATCH (n:Country) WHERE EXISTS(n.country)
with n, split(n.country, ";") as countryarray
unwind countryarray as acountry
return n.country, trim(acountry);
That code returns a well behaved:
"qa; us; ca; fr" "qa"
"qa; us; ca; fr" "us"
"qa; us; ca; fr" "ca"
"qa; us; ca; fr" "fr"
"kw; us" "kw"
"kw; us" "us"
I have multiple other nodes that connect to the nodes with delimited properties. How can I take the split results and refactor the nodes to preserve the existing relationships?
Thank you and Happy New Year.
01-06-2021 06:06 AM
Hello Brian,
Here is a solution for you. As a general rule, when you feel you have some "exotic" need that Cypher doesn't satisfy directly but you feel as if it is not so far from what Cypher can do, take a look at the APOC library ! You have some functions to duplicate nodes (that I used in this solution), move a relationship from one node to another, etc...
Here is a link to the docs : apoc.refactor.cloneNodesWithRelationships - APOC Documentation
Note that the procedure I use here is part of APOC core, which is shipped natively with Neo4j versions > 4.1.1 ; if you have a lower version, you need to install it as a plugin.
//More or less same start as yours
MATCH (n:Country) WHERE EXISTS(n.country)
WITH n, split(n.country, ";") as countryarray
UNWIND countryarray as acountry
WITH collect(n) as splitNodes, trim(acountry) as trimmed
//Apoc to clone each node, once per element in split list
CALL apoc.refactor.cloneNodesWithRelationships(splitNodes) YIELD input, output
WITH splitNodes, output as newNodes, trimmed
UNWIND splitNodes AS oldNodes
//Assign new country property
SET newNodes.country=trimmed
//Delete old nodes
DETACH DELETE oldNodes
A quick note here : I would rename your property "country" to something like "countryCode", or just "code" instead of "country" which creates a confusion with the label.
Cheers, and happy new year to you too !
Marius
01-08-2021 09:12 AM
Thank you so much. I greatly appreciate the help!
01-20-2023 08:58 AM
Just a small correction for everyone else, who is trying to do it:
apoc.refactor.cloneNodesWithRelationships is deprecated, use apoc.refactor.cloneNodes, which would look like
apoc.refactor.cloneNodes(splitNodes, true) , since second argument is withRelationships https://neo4j.com/labs/apoc/4.1/overview/apoc.refactor/apoc.refactor.cloneNodes/
All the sessions of the conference are now available online