Neo4j

jhayes8552 · ‎07-30-2021

Hello,

My problem is that whenever I try to import data via a large csv and connect to a central node, it seems to make many copies of the central node (see photo). The brown nodes are what I want connected to just one "Earth Justice" node. I realize I can merge duplicate nodes, but I would like to have it right as I load in.

My code is as follows:

:auto Using periodic commit
LOAD CSV WITH HEADERS FROM 'file:///EJ_Sample.csv' as row
CREATE (l:labid) SET l+=row
CREATE (o:Origin {name:"Earth Justice"})
MERGE (l)<-[:Created]-(o)

Thanks in advance.

Benoit_d · ‎07-30-2021

Hi,

in order to insure that MERGE is recognizing the node you are addressing, you should instore a uniqness constraint on one property of the node-label before loading the data.

CREATE CONSTRAINT ON ( orig:Origin)  ASSERT org.name IS UNIQUE

then you will be able to MERGE this as "Origin" labelled node without recreating it, which means the first appearance of "Earth Justice" will create the node, all other will merge.

:auto Using periodic commit
LOAD CSV WITH HEADERS FROM 'file:///EJ_Sample.csv' as row
CREATE (l:labid) SET l+=row
MERGE (o:Origin {name:"Earth Justice"})
MERGE (l)<-[:Created]-(o)

Pay attention, that this means the node (o:Origin {name:"Earth Justice"}) should come from the file, which is not the case in your cypher: no reference to any column.
If the origin node allready exists, just make a match:

:auto Using periodic commit
LOAD CSV WITH HEADERS FROM 'file:///EJ_Sample.csv' as row
CREATE (l:labid) SET l+=row
MATCH (o:Origin {name:"Earth Justice"})
MERGE (o)-[:Created]->(l)

In some case, for instance if you are delivering a new origin, to which every node of the source have to be connected, but the name of this node already exist, you will have to

create a temporary label e.g. tempOrigin,
create a constraint on this label,
load the data
destroy the constraint (change "Create Constraint on ..." into "Drop Constraint on ...")
relabel all nodes with label tempOrigin (should be only one) to label Origin
delabel all nodes tempOrigin

A piece of cake 😉

View solution in original post

Joel · ‎07-30-2021

MERGE can be a bit confusing to use, I agree, there are nuances that even the experienced run into again (and again). At a glance I think maybe you are explicitly creating the nodes (and creating duplicates after the first time)? Perhaps the MERGE statement is ok.

Reference:

ameyasoft · ‎07-30-2021

Try this:
LOAD CSV WITH HEADERS FROM 'file:///EJ_Sample.csv' as row

MATCH (o:Origin {name:"Earth Justice"})
with o
CREATE (l:labid) SET l+=row
MERGE (l)<-[:Created]-(o)

jhayes8552 · ‎07-30-2021

After copying and pasting your example, I get this error message.

Variable `row` not defined (line 5, column 25 (offset: 130))
"CREATE (l:labid) SET l+=row"

Doesn't quite make sense with me why that doesn't work.

Benoit_d · ‎07-30-2021

Hi,

in order to insure that MERGE is recognizing the node you are addressing, you should instore a uniqness constraint on one property of the node-label before loading the data.

CREATE CONSTRAINT ON ( orig:Origin)  ASSERT org.name IS UNIQUE

then you will be able to MERGE this as "Origin" labelled node without recreating it, which means the first appearance of "Earth Justice" will create the node, all other will merge.

:auto Using periodic commit
LOAD CSV WITH HEADERS FROM 'file:///EJ_Sample.csv' as row
CREATE (l:labid) SET l+=row
MERGE (o:Origin {name:"Earth Justice"})
MERGE (l)<-[:Created]-(o)

Pay attention, that this means the node (o:Origin {name:"Earth Justice"}) should come from the file, which is not the case in your cypher: no reference to any column.
If the origin node allready exists, just make a match:

:auto Using periodic commit
LOAD CSV WITH HEADERS FROM 'file:///EJ_Sample.csv' as row
CREATE (l:labid) SET l+=row
MATCH (o:Origin {name:"Earth Justice"})
MERGE (o)-[:Created]->(l)

In some case, for instance if you are delivering a new origin, to which every node of the source have to be connected, but the name of this node already exist, you will have to

create a temporary label e.g. tempOrigin,
create a constraint on this label,
load the data
destroy the constraint (change "Create Constraint on ..." into "Drop Constraint on ...")
relabel all nodes with label tempOrigin (should be only one) to label Origin
delabel all nodes tempOrigin

A piece of cake 😉

jhayes8552 · ‎07-30-2021

Thanks, appreciate the help. The Constraints was a good tip.

jhayes8552 · ‎07-30-2021

One more thing: For the 3rd block of code, the error message I get is that

WITH is required between SET and MATCH

The "With" doesnt execute the code, however. Thanks again for the help!

Benoit_d · ‎07-30-2021

move the SET to the end:

:auto Using periodic commit
LOAD CSV WITH HEADERS FROM 'file:///EJ_Sample.csv' as row
CREATE (l:labid) 
MATCH (o:Origin {name:"Earth Justice"})
MERGE (o)-[:Created]->(l)
SET l+=row

or make a "matched merge":

:auto Using periodic commit
LOAD CSV WITH HEADERS FROM 'file:///EJ_Sample.csv' as row
CREATE (l:labid) 
MERGE (o:Origin {name:"Earth Justice"})-[:Created]->(l)
SET l+=row

Neo4j

Connecting one node to multiple