cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Link nodes hierarchically during LOAD CSV

Hi everyone,
I reach you out because I have a problem
I have a CSV file I'm getting from an export that is structured as follows:

 L1  | L2  | L3  | L4  | ...
 id1 |     |     |     | ...
     | id2 |     |     | ...
     | id3 |     |     | ...
     |     | id4 |     | ...
     |     | id5 |     | ...
     |     |     | id6 | ...
     |     |     | id7 | ...
     | id8 |     |     | ...
 id9 |     |     |     | ...

etc.
As you can see, there is a hierarchy here between the rows:

  • id1
    • id2
    • id3
      • id4
      • id5
        • id6
        • id7
    • id8
  • id 9

I try to recreate that hierarchy into my graph using Cypher but I honestly don't know where to begin... I already have a working import using LOAD CSV with all nodes and some relationships created. Now the only missing part is that ()-[:CHILD_OF]->() relationship.

Has anyone faced this situation already? Do you have a strategy and/or code to share?
Any help very much appreciated

1 ACCEPTED SOLUTION

It would be better for your CSV to be simpler, as such the number of columns for your CSV depends on the structure, when for these simple kinds of connections you should have a fixed number of columns.

In this case, it would be far easier to use a CSV formatted like:

parent,child
root,id1
root,id9
id1,id2
id1,id3
id1,id8
id3,id4
id3,id5
id5,id6
id5,id7

Something like this, when all you need for the relationship is represented on a row (relationships represented in a CSV should not depend upon other rows, or row ordering). Then do passes to MERGE the nodes (2 passes to avoid Eager operators), then a final pass to MATCH the nodes and MERGE the relationships between them.

Sounds like the nodes already exist. Add an index on them (if it doesn't exist already) to support quick matching, MATCH the nodes, CREATE the relationships between.

View solution in original post

2 REPLIES 2

It would be better for your CSV to be simpler, as such the number of columns for your CSV depends on the structure, when for these simple kinds of connections you should have a fixed number of columns.

In this case, it would be far easier to use a CSV formatted like:

parent,child
root,id1
root,id9
id1,id2
id1,id3
id1,id8
id3,id4
id3,id5
id5,id6
id5,id7

Something like this, when all you need for the relationship is represented on a row (relationships represented in a CSV should not depend upon other rows, or row ordering). Then do passes to MERGE the nodes (2 passes to avoid Eager operators), then a final pass to MATCH the nodes and MERGE the relationships between them.

Sounds like the nodes already exist. Add an index on them (if it doesn't exist already) to support quick matching, MATCH the nodes, CREATE the relationships between.

I followed your advice and it perfectly works. I added some Excel macros to reduce manual data transformations as much as possible. Thanks @andrew.bowman