cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Populating database from CSV file with relations

I am relatively new to Neo4j and am looking for some help with correctly importing data from a CSV that I have.

I have a csv file in the format seen below. I am trying to create a database where we have 'People' and 'Competence' objects. The first column contains a persons name, and the subsequent columns each contain a competence/skill, where the column header denotes a level of proficiency (i.e beginner, intermediate, etc.)

The resulting graph should then contain objects for each Person, and Competence, without duplicates. There should also be relationships between People and Competences if the competence is present in the CSV row for that person, with a property denoting the skill level.

I know I can of course create the database object by object, relation by relation, but have no desire to do so. I have also read the documentation for loading csvs in cypher but could not quite figure out how to handle relationships in this form. Any help is much appreciated.

I am using neo4j-4.2.0-community with docker.

2X_b_b5462c948434110b81fc8ef070fa2ac13221cced.png


test.txt (195 Bytes)

1 ACCEPTED SOLUTION

ameyasoft
Graph Maven
Try this:

LOAD CSV WITH HEADERS FROM "file:///skills.csv" AS row
with row

merge (a:Name{name: row.Name})

with row.Beginner as b1, row, a
with replace(b1, '"', '') as b2, row, a
with split (b2, ',') as b3, row, a

FOREACH (i in range(0,size(b3) - 1) |
MERGE (b:Skill {skill: b3[i]})
MERGE (a)-[:BEGINNER]->(b)
)

with a, row

with row.Intermediate as i1, row, a
with replace(i1, '"', '') as i2, row, a
with split (i2, ',') as i3, row, a

FOREACH (i in range(0,size(i3) - 1) |
MERGE (b:Skill {skill: i3[i]})
MERGE (a)-[:INTERMEDIATE]->(b)
)

with a, row

with row.Expert as e1, row, a
with replace(e1, '"', '') as e2, row, a
with split (e2, ',') as e3, row, a

FOREACH (i in range(0,size(e3) - 1) |
MERGE (b:Skill {skill: e3[i]})
MERGE (a)-[:EXPERT]->(b)
)
;

Added 13 labels, created 13 nodes, set 13 properties, created 14 relationships, completed after 876 ms.

Result:

match(a:Name)-[:INTERMEDIATE]-(b:Skill)
return a, b

Result:
2X_4_47c812f8478bcc43e304ce1bf2bafdf8ecf1f21d.png

View solution in original post

3 REPLIES 3

I think there's a more elegant way to do this, but not an expert on APOC.

// load skills
CALL apoc.load.csv('skills.csv',{header:true})
YIELD map
CALL apoc.merge.node(["Person"],{name:map.Name}, {}, {}) yield node as person
UNWIND split(map.Beginner,",") as bskill
CALL apoc.merge.node(["Skill"],{name:trim(bskill)}, {}, {}) yield node as nbskill
CALL apoc.merge.relationship(person, "Knows",{}, {level:"Beginner"}, nbskill, {level:"Beginner"}) YIELD rel as brel
UNWIND split(map.Intermediate,",") as iskill
CALL apoc.merge.node(["Skill"],{name:trim(iskill)}, {}, {}) yield node as niskill
CALL apoc.merge.relationship(person, "Knows",{}, {level:"Intermediate"}, niskill, {level:"Intermediate"}) YIELD rel as irel
UNWIND split(map.Expert,",") as eskill
CALL apoc.merge.node(["Skill"],{name:trim(eskill)}, {}, {}) yield node as neskill
CALL apoc.merge.relationship(person, "Knows",{}, {level:"Expert"}, neskill, {level:"Expert"}) YIELD rel as erel
return count(*)

adding elegance would involve iterating over the columns and using the header to drive the level property value, but I haven't done it.

NOTE: this code will handle subsequent runs (i.e. i go from APOC beginner, to intermediate during a reload), but it doesn't remove skills that aren't present in a subsequent run.

Name Beginner Intermediate Expert
Robert Quinn APOC, SPARQL cypher, react java, SQL, modeling
Joe Smith SHACL, SPARQL cypher, regex python, SQL, reltio
Jill Smith cypher, regex python, SQL, reltio SHACL, SPARQL

ameyasoft
Graph Maven
Try this:

LOAD CSV WITH HEADERS FROM "file:///skills.csv" AS row
with row

merge (a:Name{name: row.Name})

with row.Beginner as b1, row, a
with replace(b1, '"', '') as b2, row, a
with split (b2, ',') as b3, row, a

FOREACH (i in range(0,size(b3) - 1) |
MERGE (b:Skill {skill: b3[i]})
MERGE (a)-[:BEGINNER]->(b)
)

with a, row

with row.Intermediate as i1, row, a
with replace(i1, '"', '') as i2, row, a
with split (i2, ',') as i3, row, a

FOREACH (i in range(0,size(i3) - 1) |
MERGE (b:Skill {skill: i3[i]})
MERGE (a)-[:INTERMEDIATE]->(b)
)

with a, row

with row.Expert as e1, row, a
with replace(e1, '"', '') as e2, row, a
with split (e2, ',') as e3, row, a

FOREACH (i in range(0,size(e3) - 1) |
MERGE (b:Skill {skill: e3[i]})
MERGE (a)-[:EXPERT]->(b)
)
;

Added 13 labels, created 13 nodes, set 13 properties, created 14 relationships, completed after 876 ms.

Result:

match(a:Name)-[:INTERMEDIATE]-(b:Skill)
return a, b

Result:
2X_4_47c812f8478bcc43e304ce1bf2bafdf8ecf1f21d.png

Thank you both so much for your help. Robert I'm sure you have made a fine and working solution but when I saw the responses this morning continuing with the least resistance meant using the solution without extra packages. Ameya, the solution worked great, The only thing I changed was to use one relationship type 'knows' and then have a property denoting the skill level. Thanks again guys.