cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Update data via .csv file

Hello guys.

I have a question in mind but still unable to find any discussion that answers my question.

So the situation is, I have a dataset that I have loaded into local via .csv. I am wondering, if there are any amendments made on my data (eg; change in work, address, nodes to delete), what is the best way to my dataset in neo4j?

Updating the data one by one is not an option, since a lot of observations will be updated.

Thanks in advance.

1 ACCEPTED SOLUTION

I used the script below to do the initial loading and updating. Bear in mind that the updating using the script below only works for updating the properties. Meanwhile, nodes will only be created on top of the current node available.

 

CREATE CONSTRAINT IF NOT EXISTS ON (n: `CustID`) ASSERT n.`custID` IS UNIQUE;
CREATE CONSTRAINT IF NOT EXISTS ON (n: `PhoneNumber`) ASSERT n.`phoneNum` IS UNIQUE;

:auto USING PERIODIC COMMIT        
LOAD CSV WITH HEADERS FROM "file:///Dummy.csv" AS row
WITH row
MERGE (c:CustID {custID: toInteger(row.CustomerID)})
ON CREATE SET      //change to ON MATCH SET for update
c.custName = row.CustName, 
c.idNum = toInteger(row.IDNumber),
c.nationality = row.Nationality;

//create PhoneNumber node
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///Dummy.csv" AS row
WITH row
MERGE (p:PhoneNumber {phoneNum: row.PhoneNumber});

:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///Dummy.csv" AS row
WITH row
MATCH (c:CustID {custID: toInteger(row.CustomerID)})
MATCH (p:PhoneNumber {phoneNum: row.PhoneNumber})
MERGE (p)-[r:CONTACT_NO]-(c);

 

I'm thinking of deleting nodes associated with CustID in the update file before recreating the nodes and relationships again. Do you think that is the best possible solution?

View solution in original post

7 REPLIES 7

Why is it not an option? You would just run a script regularly that gets the updates as a csv and updates the data.

LOAD CSV WITH HEADERS FROM "url" AS row
CALL { WITH n
MATCH (n:Node) WHERE n.id = row.id
SET n.updated = toFloat(row.updated) // for example
} IN TRANSACTIONS OF 50000 ROWS;

Thank you for the reply @michael_hunger 

I tried rerunning the script with the updated data, and it works at updating properties of the node, but not the node relationships.

For example, I updated a few of customer information, some of the information is stored as property of the 'Customer' node. Rerunning the initial script I used to load the data works at updating the property.

However, the customer also has updated his CONTACT_NO information, where it is stored in a node 'PhoneNumber'. Rerunning the initial script will result in creating another 'PhoneNumber' node and CONTACT_NO relationship, meaning the 'Customer' node I updated now has two CONTACT_NO relationships to two 'PhoneNumber' nodes, the old one and the latest. The intended result should be one 'PhoneNumber' node, the latest one.

I wonder if there is a way to write the query for the situation above without having to adjust the script for each node to delete

Can you post your initial script and your update script you are using? 

I used the script below for both initial load and data update. Updating data using script below works for property (in the case of below for example, updating customer email). However, updating customer phone number will not delete the previous node, but simply creating another node for phone number.

 

:auto USING PERIODIC COMMIT         //to commit 500 line at a time
LOAD CSV WITH HEADERS FROM "file:///DummyData.csv" AS row
WITH row
MERGE (c:CustID {cifNum: toInteger(row.CustID)})
ON CREATE SET 
c.custName = row.CustName, 
c.idNum = toInteger(row.IDNumber),
c.email = row.EmailAddress;

LOAD CSV WITH HEADERS FROM "file:///DummyData.csv" AS row
WITH row
MERGE (p:PhoneNumber {phoneNum: row.PhoneNumber});

LOAD CSV WITH HEADERS FROM "file:///DummyData.csv" AS row
WITH row
MATCH (c:CustID {cifNum: toInteger(row.CustID)})
MATCH (p:PhoneNumber {phoneNum: row.PhoneNumber})
MERGE (p)-[r:CONTACT_NO]-(c);

//plus a few other nodes​

 

I'm thinking about deleting everything associated with a particular CustID (nodes, relationship, properties) and, reupdate and recreate everything again. Will that work do you think?

I used the script below to do the initial loading and updating. Bear in mind that the updating using the script below only works for updating the properties. Meanwhile, nodes will only be created on top of the current node available.

 

CREATE CONSTRAINT IF NOT EXISTS ON (n: `CustID`) ASSERT n.`custID` IS UNIQUE;
CREATE CONSTRAINT IF NOT EXISTS ON (n: `PhoneNumber`) ASSERT n.`phoneNum` IS UNIQUE;

:auto USING PERIODIC COMMIT        
LOAD CSV WITH HEADERS FROM "file:///Dummy.csv" AS row
WITH row
MERGE (c:CustID {custID: toInteger(row.CustomerID)})
ON CREATE SET      //change to ON MATCH SET for update
c.custName = row.CustName, 
c.idNum = toInteger(row.IDNumber),
c.nationality = row.Nationality;

//create PhoneNumber node
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///Dummy.csv" AS row
WITH row
MERGE (p:PhoneNumber {phoneNum: row.PhoneNumber});

:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///Dummy.csv" AS row
WITH row
MATCH (c:CustID {custID: toInteger(row.CustomerID)})
MATCH (p:PhoneNumber {phoneNum: row.PhoneNumber})
MERGE (p)-[r:CONTACT_NO]-(c);

 

I'm thinking of deleting nodes associated with CustID in the update file before recreating the nodes and relationships again. Do you think that is the best possible solution?

To update, I used the above 'delete nodes associated with CustID in the update file' method for updating and it works

Not sure if you really need to extract the Phone Number into as separate node as it's not an essential entity in your domain.

But anyhow.

You can use

MATCH (c:Customer {id:row.customer_id})
MERGE (c)-[:CONTACT_NO]->(p:PhoneNumber)
SET p.phone = row.phoneNumber

Here, merge will get-or-create the relationship and the node with the PhoneNumber label within the context of the customer (one only) and then you can set/override the actual phone number. 

If you have something that already has an immutable ID, e.g. an order, then you can put the order id into that node pattern which would then be unique within the context of that customer too.