Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-16-2021 03:58 PM
Hello,
I have a csv file with headers like the image below:
Is there a way to create nodes with attributes ID and nodes with attributes TT by reading headers that begin with ID and TT ?
06-17-2021 01:30 AM
Hi @familylife103 ,
You'll want to read up about LOAD CSV
which does just what you'd like: LOAD CSV - Neo4j Cypher Manual
For example (and depending on where you've got your CSV file located) you could do:
LOAD CSV FROM 'file:///ida.csv' AS line
CREATE (:ID {ida: line[0], idb: toInteger(line[2]), idc: line[4]} )
CREATE (:TT {tta: line[1], ttb: line[3] } )
This assumes that every line contains valid fields for both kinds of nodes.
Oh, when first using LOAD CSV
it can be helpful to simply return a few line
s like this:
LOAD CSV FROM 'file:///ida.csv' AS line
RETURN line LIMIT 10
Hope that's enough to get you started.
Best,
ABK
06-19-2021 02:34 PM
Thank you.
In fact I meant a node for each column, but I need a way to generalize the code instead of creating the nodes one by one!
06-23-2021 01:27 AM
Generalizing is often very specialized. 🙂
Let's say you had a simplified CSV like:
A,B,C,D
1,2,3,4
5,6,7,8
9,10,11,12
Then you could unwind the columns of each row, creating a node for each column.
WITH "http://localhost:11001/project-455d73b0-9c28-4a58-bb6e-9e5d0aae4072/example.csv" as url
LOAD CSV WITH HEADERS from url AS rowMap
UNWIND keys(rowMap) as columnHead
WITH columnHead, rowMap[columnHead] as columnValue
CALL apoc.create.node([columnHead], {id: columnValue}) YIELD node
RETURN node
That would create 12 nodes, one for each value, labeled according to column header.
Best,
ABK
06-23-2021 12:18 PM
My advice is to reconsider how you're modeling this in your CSV. In general, as data is added to a CSV file, it should result in more rows, appending at the end of the file, you should not be adding on to the columns. A row should correspond with the data necessary to model either nodes with properties, or some kind of association (maybe for a relationship) between nodes associated with those properties. Usually that's a single relationship being captured per row.
For example, a CSV for a social graph might look like:
personA, personB
1, 2
1, 3
1, 4
1, 5
3, 5
Each column represents an id. Each row represents a relationship to create between the persons with those ids. If we need to add, remove, or change the relationship data to create, we add/remove/modify the rows, no need to add or remove columns unless we need to work with more properties, or unless the data of the new columns represents something else. But in any case, I don't have to capture friend-of-a-friend or friend-of-a-friend-of-a-friend as additional and varying numbers of columns.
All the sessions of the conference are now available online