cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Better way to merge/match loading from csv

norbert
Node Link

I am reading data from a file and want to create nodes and relations between them. There is a chance that either of the two nodes of the relation are not existing by now.

I solved this by:

LOAD CSV WITH HEADERS FROM "foobar.csv" AS ROW
  MERGE (a:TypeA { name: row.name, revision: toInt(row.revision) })
  MERGE (b:TypeB { name: row.name }) ;

LOAD CSV WITH HEADERS FROM "foobar.csv" AS ROW
  MATCH (a:TypeA { name: row.name, revision: toInt(row.revision) })
  MATCH (b:TypeB { name: row.name }) 
  MERGE (a) -[r:is_instance]-> (b) ;

This loads the csv two times, which is not optimal.

I have the following cases:

  • (a), (b), (r) do not exists: In this case I could have simply used one single MERGE command
  • (b) exists, (a), (r) do not exists: In this case I would need a MATCH against (b) followed by a merge of the relation
  • (a), (b), (r) exists: a simple merge or a match against all three all would be fine.

Using the second variant that MATCH against (b) fails to create anything in the first case, because there is no MATCH at all.

Is there any other way to avoid using the double loop over csv?

Thanks

8 REPLIES 8

Sure, just use MERGE for all 3:

LOAD CSV WITH HEADERS FROM "foobar.csv" AS row
  MERGE (a:TypeA { name: row.name, revision: toInt(row.revision) })
  MERGE (b:TypeB { name: row.name })
  MERGE (a) -[r:is_instance]-> (b) ;

Thanks, - I guess without the ; after the second MERGE.

I somehow assumed that the first two MERGE will not be ready for the third one.

Thanks for the quick response, and sorry for the rather stupid question

My mistake on the ;, and you're welcome!

No stupid questions here, learning is full of trial and error.

jsmccrumb
Graph Buddy

I think doing it all in one go will cause an EAGER (run it through an EXPLAIN). If you have a large CSV this could lead to problems. Might be better to iterate over the file twice. Just something to keep an eye on, if it works loading all at once then no worries.

In this case since the nodes are of two different types, no Eager will be introduced.

Thanks for the clarification, good to know!

I don't know about EAGER, sorry, but I anyway added a USING PERIODIC COMMIT 1000 before each load.

I was mistaken, you only have EAGER if the nodes had the same label. If you run:

EXPLAIN
LOAD CSV WITH HEADERS FROM "foobar.csv" AS row
MERGE (a:TypeA { name: row.name, revision: toInt(row.revision) })
MERGE (b:TypeB { name: row.name })
MERGE (a) -[r:is_instance]-> (b) ;

in the browser it comes back with the "query plan". If you see a line that just says "EAGER" when loading a CSV, its bad news. You should be good