cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Match imported CSV column values with existing nodes?

apeng21
Node Link

I need some help in importing a CSV file and matching the contents of one column to existing nodes in my graph database. The CSV currently looks like this:

ContentName, URL, Creator, Tags, Comments

All of these columns contain strings. However, the "Tags" column contains individual words separated by commas, like so: Word1, Word2, Word3, Word4.

These words all have a corresponding node that is already in the database. I want to separate those Words from one another. Then I need to match, for example, Word1 in the CSV to Word1 as a node in the database and create a relationship. Is there a way to do this?

1 ACCEPTED SOLUTION

Try this:

//in .csv change this to:
//with row.Tags as tags.........
with "Basics, Sketching, Perspective" as tags
with split(tags, ",") as t1

// Change the appropriate node label.....
match (t:Tag)
where t.tag in t1

// creating a Contentname node (row 2).....I didn't add all the properties for this test......

merge (c:ContentName {name: "Drawing Fundamentals", creator: "Will Kemp", type: "Course"})

//code to create relationships between (c) and 'Tag' nodes......
merge (c)-[:hasTag]->(t)
return c, t

Result:

View solution in original post

12 REPLIES 12

ameyasoft
Graph Maven
Assuming this:
(a:Tags {tags: "Word1, Word2, Word3, Word4"}) and
(b:Word {word:"Word2"})

match (a:Tags)
with split(a.tags, ",") as t1

//test
return t1[0], t1[1], t1[2], t1[3]
Result: Word1, Word2, Word3, Word4

match (b:Word) 
where b.word = t1[1]

From here you continue further.

Thanks for responding. So I'm seeing that you're extracting the whole CSV string (Word1, Word2, etc.) into a node first, splitting its contents into an array, and then matching them to existing nodes with

match (b:Word) 
where b.word = t1[1]

But this part of the code doesn't work, namely because I'm trying to transform each CSV row into its own node (with ContentName, URL, Creator, Comments) with labels. Then I want to connect this node to existing Word1/Word2/Word3 nodes in the database. This chart should make it clearer:

Post couple of data lines from your .csv file including the headers.

Is it fair to assume that you are using .csv file to create 'Content' nodes.? You show 'Tags', but is not clear how you are using 'Tags'.

Sorry for getting back late! Here's a few example data lines from the CSV file:


And yes each row should represent a node, with ContentName, URL, and Type as node labels.
For the purpose of this example, all of the tags listed under the Tags column already exist in the database. I just want to connect the row nodes to those existing Tag nodes. The Word1/Word2/Word3 bubbles in the posted diagram represent these.

Thanks for the info. It's very helpful for me to design the code.

Try this:

//in .csv change this to:
//with row.Tags as tags.........
with "Basics, Sketching, Perspective" as tags
with split(tags, ",") as t1

// Change the appropriate node label.....
match (t:Tag)
where t.tag in t1

// creating a Contentname node (row 2).....I didn't add all the properties for this test......

merge (c:ContentName {name: "Drawing Fundamentals", creator: "Will Kemp", type: "Course"})

//code to create relationships between (c) and 'Tag' nodes......
merge (c)-[:hasTag]->(t)
return c, t

Result:

Thank you for taking the time to figure this out
But if I have thousands of rows in the CSV, is there a way to reference the contents of the CSV "Tags" column without having to type out what's inside? The same goes for the other columns.

I already mentioned in the comment  to replace my WITH statements:
Replace my first WITH statement:

my sample:

with "Basics, Sketching, Perspective" as tags

//in .csv change this to:

with row.tags as tags
 That's all the change!

It works now! Thank you so much for your expertise.

Glad to be of some help!

apeng21
Node Link

Got a new problem. For some reason my code is matching tags inconsistently with my existing nodes. For example, Drawing Fundamentals has the tags "Basics", "Sketching" and "Perspective". But as you can see from the chart, those tags have not been matched. Here is the code I adapted:

load csv with headers from "https://docs.google.com/spreadsheets/d/15Xvz8JgG_AztY6Ww18EuO5weoVDK4mD8bqGR0UlmGRM/export?format=csv" as csv
with csv.Tags as tags
with split(tags, ", ") as t1
match (n:skos__Concept)
where n.skos__prefLabel in t1
load csv with headers from "https://docs.google.com/spreadsheets/d/15Xvz8JgG_AztY6Ww18EuO5weoVDK4mD8bqGR0UlmGRM/export?format=csv" as csv
merge (c:ContentName {name: csv.ContentName, creator: csv.Creator, URL: csv.URL, Type: csv.Type})
merge (c)-[:hasTag]->(n)
return c, n

It produces this in Neo4j. As you can see, Drawing Fundamentals has not been matched to "Basics", "Sketching" and "Perspective".