Neo4j

TheFoolishPupil · ‎11-30-2020

I am relatively new to Neo4j and am looking for some help with correctly importing data from a CSV that I have.

I have a csv file in the format seen below. I am trying to create a database where we have 'People' and 'Competence' objects. The first column contains a persons name, and the subsequent columns each contain a competence/skill, where the column header denotes a level of proficiency (i.e beginner, intermediate, etc.)

The resulting graph should then contain objects for each Person, and Competence, without duplicates. There should also be relationships between People and Competences if the competence is present in the CSV row for that person, with a property denoting the skill level.

I know I can of course create the database object by object, relation by relation, but have no desire to do so. I have also read the documentation for loading csvs in cypher but could not quite figure out how to handle relationships in this form. Any help is much appreciated.

I am using neo4j-4.2.0-community with docker.

test.txt (195 Bytes)

ameyasoft · ‎11-30-2020

Try this:

LOAD CSV WITH HEADERS FROM "file:///skills.csv" AS row
with row

merge (a:Name{name: row.Name})

with row.Beginner as b1, row, a
with replace(b1, '"', '') as b2, row, a
with split (b2, ',') as b3, row, a

FOREACH (i in range(0,size(b3) - 1) |
MERGE (b:Skill {skill: b3[i]})
MERGE (a)-[:BEGINNER]->(b)
)

with a, row

with row.Intermediate as i1, row, a
with replace(i1, '"', '') as i2, row, a
with split (i2, ',') as i3, row, a

FOREACH (i in range(0,size(i3) - 1) |
MERGE (b:Skill {skill: i3[i]})
MERGE (a)-[:INTERMEDIATE]->(b)
)

with a, row

with row.Expert as e1, row, a
with replace(e1, '"', '') as e2, row, a
with split (e2, ',') as e3, row, a

FOREACH (i in range(0,size(e3) - 1) |
MERGE (b:Skill {skill: e3[i]})
MERGE (a)-[:EXPERT]->(b)
)
;

Added 13 labels, created 13 nodes, set 13 properties, created 14 relationships, completed after 876 ms.

Result:

match(a:Name)-[:INTERMEDIATE]-(b:Skill)
return a, b

Result:

View solution in original post

robert_quinn · ‎11-30-2020

I think there's a more elegant way to do this, but not an expert on APOC.

// load skills
CALL apoc.load.csv('skills.csv',{header:true})
YIELD map
CALL apoc.merge.node(["Person"],{name:map.Name}, {}, {}) yield node as person
UNWIND split(map.Beginner,",") as bskill
CALL apoc.merge.node(["Skill"],{name:trim(bskill)}, {}, {}) yield node as nbskill
CALL apoc.merge.relationship(person, "Knows",{}, {level:"Beginner"}, nbskill, {level:"Beginner"}) YIELD rel as brel
UNWIND split(map.Intermediate,",") as iskill
CALL apoc.merge.node(["Skill"],{name:trim(iskill)}, {}, {}) yield node as niskill
CALL apoc.merge.relationship(person, "Knows",{}, {level:"Intermediate"}, niskill, {level:"Intermediate"}) YIELD rel as irel
UNWIND split(map.Expert,",") as eskill
CALL apoc.merge.node(["Skill"],{name:trim(eskill)}, {}, {}) yield node as neskill
CALL apoc.merge.relationship(person, "Knows",{}, {level:"Expert"}, neskill, {level:"Expert"}) YIELD rel as erel
return count(*)

adding elegance would involve iterating over the columns and using the header to drive the level property value, but I haven't done it.

NOTE: this code will handle subsequent runs (i.e. i go from APOC beginner, to intermediate during a reload), but it doesn't remove skills that aren't present in a subsequent run.

Name	Beginner	Intermediate	Expert
Robert Quinn	APOC, SPARQL	cypher, react	java, SQL, modeling
Joe Smith	SHACL, SPARQL	cypher, regex	python, SQL, reltio
Jill Smith	cypher, regex	python, SQL, reltio	SHACL, SPARQL

ameyasoft · ‎11-30-2020

Try this:

LOAD CSV WITH HEADERS FROM "file:///skills.csv" AS row
with row

merge (a:Name{name: row.Name})

with row.Beginner as b1, row, a
with replace(b1, '"', '') as b2, row, a
with split (b2, ',') as b3, row, a

FOREACH (i in range(0,size(b3) - 1) |
MERGE (b:Skill {skill: b3[i]})
MERGE (a)-[:BEGINNER]->(b)
)

with a, row

with row.Intermediate as i1, row, a
with replace(i1, '"', '') as i2, row, a
with split (i2, ',') as i3, row, a

FOREACH (i in range(0,size(i3) - 1) |
MERGE (b:Skill {skill: i3[i]})
MERGE (a)-[:INTERMEDIATE]->(b)
)

with a, row

with row.Expert as e1, row, a
with replace(e1, '"', '') as e2, row, a
with split (e2, ',') as e3, row, a

FOREACH (i in range(0,size(e3) - 1) |
MERGE (b:Skill {skill: e3[i]})
MERGE (a)-[:EXPERT]->(b)
)
;

Added 13 labels, created 13 nodes, set 13 properties, created 14 relationships, completed after 876 ms.

Result:

match(a:Name)-[:INTERMEDIATE]-(b:Skill)
return a, b

Result:

TheFoolishPupil · ‎12-01-2020

Thank you both so much for your help. Robert I'm sure you have made a fine and working solution but when I saw the responses this morning continuing with the least resistance meant using the solution without extra packages. Ameya, the solution worked great, The only thing I changed was to use one relationship type 'knows' and then have a property denoting the skill level. Thanks again guys.

Neo4j

Populating database from CSV file with relations