cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Create nodes

Hello,
I have a csv file with headers like the image below:

Is there a way to create nodes with attributes ID and nodes with attributes TT by reading headers that begin with ID and TT ?

4 REPLIES 4

Hi @familylife103 ,

You'll want to read up about LOAD CSV which does just what you'd like: LOAD CSV - Neo4j Cypher Manual

For example (and depending on where you've got your CSV file located) you could do:

LOAD CSV FROM 'file:///ida.csv' AS line
CREATE (:ID {ida: line[0], idb: toInteger(line[2]),  idc: line[4]} )
CREATE (:TT {tta: line[1], ttb: line[3] } )

This assumes that every line contains valid fields for both kinds of nodes.

Oh, when first using LOAD CSV it can be helpful to simply return a few lines like this:

LOAD CSV FROM 'file:///ida.csv' AS line
RETURN line LIMIT 10

Hope that's enough to get you started.

Best,
ABK

Thank you.
In fact I meant a node for each column, but I need a way to generalize the code instead of creating the nodes one by one!

Generalizing is often very specialized. 🙂

Let's say you had a simplified CSV like:

A,B,C,D
1,2,3,4
5,6,7,8
9,10,11,12

Then you could unwind the columns of each row, creating a node for each column.

WITH "http://localhost:11001/project-455d73b0-9c28-4a58-bb6e-9e5d0aae4072/example.csv" as url
LOAD CSV WITH HEADERS from url AS rowMap
UNWIND keys(rowMap) as columnHead
WITH columnHead, rowMap[columnHead] as columnValue
CALL apoc.create.node([columnHead], {id: columnValue}) YIELD node
RETURN node

That would create 12 nodes, one for each value, labeled according to column header.

Best,
ABK

My advice is to reconsider how you're modeling this in your CSV. In general, as data is added to a CSV file, it should result in more rows, appending at the end of the file, you should not be adding on to the columns. A row should correspond with the data necessary to model either nodes with properties, or some kind of association (maybe for a relationship) between nodes associated with those properties. Usually that's a single relationship being captured per row.

For example, a CSV for a social graph might look like:

personA, personB
1, 2
1, 3
1, 4
1, 5
3, 5

Each column represents an id. Each row represents a relationship to create between the persons with those ids. If we need to add, remove, or change the relationship data to create, we add/remove/modify the rows, no need to add or remove columns unless we need to work with more properties, or unless the data of the new columns represents something else. But in any case, I don't have to capture friend-of-a-friend or friend-of-a-friend-of-a-friend as additional and varying numbers of columns.