Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-30-2020 08:54 AM
I am relatively new to Neo4j and am looking for some help with correctly importing data from a CSV that I have.
I have a csv file in the format seen below. I am trying to create a database where we have 'People' and 'Competence' objects. The first column contains a persons name, and the subsequent columns each contain a competence/skill, where the column header denotes a level of proficiency (i.e beginner, intermediate, etc.)
The resulting graph should then contain objects for each Person, and Competence, without duplicates. There should also be relationships between People and Competences if the competence is present in the CSV row for that person, with a property denoting the skill level.
I know I can of course create the database object by object, relation by relation, but have no desire to do so. I have also read the documentation for loading csvs in cypher but could not quite figure out how to handle relationships in this form. Any help is much appreciated.
I am using neo4j-4.2.0-community with docker.
Solved! Go to Solution.
11-30-2020 08:23 PM
Try this:
LOAD CSV WITH HEADERS FROM "file:///skills.csv" AS row
with row
merge (a:Name{name: row.Name})
with row.Beginner as b1, row, a
with replace(b1, '"', '') as b2, row, a
with split (b2, ',') as b3, row, a
FOREACH (i in range(0,size(b3) - 1) |
MERGE (b:Skill {skill: b3[i]})
MERGE (a)-[:BEGINNER]->(b)
)
with a, row
with row.Intermediate as i1, row, a
with replace(i1, '"', '') as i2, row, a
with split (i2, ',') as i3, row, a
FOREACH (i in range(0,size(i3) - 1) |
MERGE (b:Skill {skill: i3[i]})
MERGE (a)-[:INTERMEDIATE]->(b)
)
with a, row
with row.Expert as e1, row, a
with replace(e1, '"', '') as e2, row, a
with split (e2, ',') as e3, row, a
FOREACH (i in range(0,size(e3) - 1) |
MERGE (b:Skill {skill: e3[i]})
MERGE (a)-[:EXPERT]->(b)
)
;
Added 13 labels, created 13 nodes, set 13 properties, created 14 relationships, completed after 876 ms.
Result:
match(a:Name)-[:INTERMEDIATE]-(b:Skill)
return a, b
Result:
11-30-2020 03:02 PM
I think there's a more elegant way to do this, but not an expert on APOC.
// load skills
CALL apoc.load.csv('skills.csv',{header:true})
YIELD map
CALL apoc.merge.node(["Person"],{name:map.Name}, {}, {}) yield node as person
UNWIND split(map.Beginner,",") as bskill
CALL apoc.merge.node(["Skill"],{name:trim(bskill)}, {}, {}) yield node as nbskill
CALL apoc.merge.relationship(person, "Knows",{}, {level:"Beginner"}, nbskill, {level:"Beginner"}) YIELD rel as brel
UNWIND split(map.Intermediate,",") as iskill
CALL apoc.merge.node(["Skill"],{name:trim(iskill)}, {}, {}) yield node as niskill
CALL apoc.merge.relationship(person, "Knows",{}, {level:"Intermediate"}, niskill, {level:"Intermediate"}) YIELD rel as irel
UNWIND split(map.Expert,",") as eskill
CALL apoc.merge.node(["Skill"],{name:trim(eskill)}, {}, {}) yield node as neskill
CALL apoc.merge.relationship(person, "Knows",{}, {level:"Expert"}, neskill, {level:"Expert"}) YIELD rel as erel
return count(*)
adding elegance would involve iterating over the columns and using the header to drive the level property value, but I haven't done it.
NOTE: this code will handle subsequent runs (i.e. i go from APOC beginner, to intermediate during a reload), but it doesn't remove skills that aren't present in a subsequent run.
Name | Beginner | Intermediate | Expert |
---|---|---|---|
Robert Quinn | APOC, SPARQL | cypher, react | java, SQL, modeling |
Joe Smith | SHACL, SPARQL | cypher, regex | python, SQL, reltio |
Jill Smith | cypher, regex | python, SQL, reltio | SHACL, SPARQL |
11-30-2020 08:23 PM
Try this:
LOAD CSV WITH HEADERS FROM "file:///skills.csv" AS row
with row
merge (a:Name{name: row.Name})
with row.Beginner as b1, row, a
with replace(b1, '"', '') as b2, row, a
with split (b2, ',') as b3, row, a
FOREACH (i in range(0,size(b3) - 1) |
MERGE (b:Skill {skill: b3[i]})
MERGE (a)-[:BEGINNER]->(b)
)
with a, row
with row.Intermediate as i1, row, a
with replace(i1, '"', '') as i2, row, a
with split (i2, ',') as i3, row, a
FOREACH (i in range(0,size(i3) - 1) |
MERGE (b:Skill {skill: i3[i]})
MERGE (a)-[:INTERMEDIATE]->(b)
)
with a, row
with row.Expert as e1, row, a
with replace(e1, '"', '') as e2, row, a
with split (e2, ',') as e3, row, a
FOREACH (i in range(0,size(e3) - 1) |
MERGE (b:Skill {skill: e3[i]})
MERGE (a)-[:EXPERT]->(b)
)
;
Added 13 labels, created 13 nodes, set 13 properties, created 14 relationships, completed after 876 ms.
Result:
match(a:Name)-[:INTERMEDIATE]-(b:Skill)
return a, b
Result:
12-01-2020 01:42 AM
Thank you both so much for your help. Robert I'm sure you have made a fine and working solution but when I saw the responses this morning continuing with the least resistance meant using the solution without extra packages. Ameya, the solution worked great, The only thing I changed was to use one relationship type 'knows' and then have a property denoting the skill level. Thanks again guys.
All the sessions of the conference are now available online