cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Matching and Merging data from multiple csv files

Reuben
Graph Buddy

Hi!, Given that I have created nodes and relationships from the "example.csv" file as below. However, I have a new file, "other_names.csv", with the family_names in example.csv as column heads with a series of first_names under each column.

Any HINT/SUGGESTION on how to –> MATCH AND MERGE<– without individually creating nodes for each column, like row.Dickens , row.Wang etc. ? Thanks

"LOAD CSV WITH HEADERS FROM $path AS row "
"MERGE (f:family_name {name: row.family_name}) "
"MERGE (fn:first_name {name: row.first_name}) "
"MERGE (h:hometown {name: row.hometown}) "
"MERGE (r:race {name: row.race}) "
"MERGE (a:age {name: row.age}) "
"MERGE (o:occupation {name: row.occupation}) "
"MERGE (f)-[:first_name_is]->(fn) "
"MERGE (fn)-[:commonly_from]->(h) "
"MERGE (f)-[:often_associated_the_race]->(r) "
"MERGE (fn)-[:age_is]-(a) "
"MERGE (fn)-[:works_as]->(o) "

Reuben_0-1670427445609.png

#Neo4j #CSV #GraphDatabase 

1 ACCEPTED SOLUTION

Try this:

load csv from "file:///other_names.csv" as row
with row as headerRow
limit 1
load csv from "file:///other_names.csv" as row
with headerRow, row
skip 1
unwind range(0,size(headerRow)-1) as index
merge(f:family_name{name:headerRow[index]})
merge(fn:first_name{name:row[index]})
merge (f)-[:first_name_is]->(fn)

View solution in original post

14 REPLIES 14

how do you want the new data related to the existing data? 

I want to update this relationship  "MERGE (f)-[:first_name_is]->(fn) " from the file other_names.csv

exa.png

Do you want to related each first name in each column to their corresponding family name node specified in the column header? 

`that's correct. `those column headers have already been established as nodes in example.csv so i need to match and merge them from this new CSV file

Try this:

load csv from "file:///other_names.csv" as row
with row as headerRow
limit 1
load csv from "file:///other_names.csv" as row
with headerRow, row
skip 1
unwind range(0,size(headerRow)-1) as index
merge(f:family_name{name:headerRow[index]})
merge(fn:first_name{name:row[index]})
merge (f)-[:first_name_is]->(fn)

Reuben
Graph Buddy

Thanks @glilienfield 

the first line should be this instead, right?

load csv from "file:///example.csv" as row
-
using this I get the semantic error: 
Cannot merge the following node because of null property value for 'name': (:family_name {name: null})
So, I changed the range to :
---
load csv from "file:///trial/other_names.csv" as row
with row as headerRow
limit 1
load csv from "file:///trial/other_names.csv" as row
with headerRow, row
skip 1
unwind range(1,size(headerRow)-1) as index
merge(f:family_name{name:headerRow[index]})
merge(fn:first_name{name:row[index]})
merge (f)-[:first_name_is]->(fn)

 

@glilienfield  with reference to these files”—-Considering that I want to skip all the null values , what’s the best way to tackle it. I tried use the “is null approach” but failed to get it right. 

You had change the starting index to one. Is there a column of row numbers in the first column?  If so, this column does not have a header and would explain that null value error, which occurred for the family name (header values).

Are there null values in the row data?  

 

The case I presented earlier was just a sample case to help do a knowledge transfer. The real  data I am working on looks something like this (48 columns):

Screenshot 2022-12-10 at 20.23.14.png

 

There are three approach I can think of for dealing with null values during import: 1) coalesce to set a default value, 2) call subquery testing for null condition, and 3) apoc.do.when 

https://neo4j.com/labs/apoc/4.1/overview/apoc.do/apoc.do.when/

do you want to work on the actually query? 

Reuben
Graph Buddy

@glilienfield pardon me, how come when you paste your codes, they appear in a block form (image kind of)? any trick there? 😬

yes...you need to paste the code in the a code block.  First, click on the ellipse symbol in the tool bar (far left - see figure)

Screen Shot 2022-12-08 at 8.05.17 PM.png

Next, click on the code block symbol '</>':

Screen Shot 2022-12-08 at 8.05.28 PM.png

Alright, let me try, 😎

 

MATCH (f:flows)-[r]->(f2)
UNWIND keys({mat: 's', pro: 'p', feat: 'd'}) AS keyword
WITH f, f2, r, keyword
WHERE({mat: 's', pro: 'p', feat: 'd'})[keyword] <> '' AND type(r) = keyword AND f2.name CONTAINS {mat: 's', pro: 'p', feat: 'd'}[keyword]
WITH f.name as text, count(*) AS c ORDER BY c DESC, size(text) ASC
RETURN text