Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
on 04-27-2021 10:49 AM
I am trying to under stand how I would map/ interconnect row data to headers.
Below is a sample of the data and a drawing off what I would think the output would look.
A | B | C | D | |
---|---|---|---|---|
A | N | 2 | 3 | 4 |
B | 5 | N | 7 | 8 |
C | 9 | 0 | N | 2 |
D | 3 | 4 | 5 | N |
Here is my solution:
I created a csv file that matches with your data.
Col1,A,B,C,D
A,N,2,3,4
B,5,N,7,8
C,9,0,N,2
D,3,4,5,N
Step 1: As you are creating nodes with labels from column headers, I first created these nodes.
LOAD CSV WITH HEADERS from 'file:///joem.csv' as row
with row limit 1
with keys(row) as k1
CALL apoc.create.node(["Headers"], {name: k1[1]})
CALL apoc.create.node(["Headers"], {name: k1[2]})
CALL apoc.create.node(["Headers"], {name: k1[3]})
CALL apoc.create.node(["Headers"], {name: k1[4]})
Created nodes with names A, B, C, D
Step 2:
LOAD CSV WITH HEADERS from 'file:///joem.csv' as row
with row
with keys(row) as k1, row
match(a:Headers) where a.name = k1[1]
match(b:Headers) where b.name = k1[2]
match(c:Headers) where c.name = k1[3]
match(d:Headers) where d.name = k1[4]
//case when row.Col1 = k1[1]
FOREACH(ignoreMe IN CASE WHEN row.Col1 = k1[1] THEN [1] ELSE [] END |
merge (a)-[:ROW_WISE {val: toInteger(row.B)}]->(b)
merge (a)-[:ROW_WISE {val: toInteger(row.C)}]->(c)
merge (a)-[:ROW_WISE {val: toInteger(row.D)}]->(d)
)
//case when row.Col1 = k1[2]
FOREACH(ignoreMe IN CASE WHEN row.Col1 = k1[2] THEN [1] ELSE [] END |
merge (a)-[:COLUMN_WISE {val: toInteger(row.A)}]->(b)
merge (c)-[:COLUMN_WISE {val: toInteger(row.C)}]->(b)
merge (d)-[:COLUMN__WISE {val: toInteger(row.D)}]->(b)
)
//case when row.Col1 = k1[3]
FOREACH(ignoreMe IN CASE WHEN row.Col1 = k1[3] THEN [1] ELSE [] END |
merge (a)-[:COLUMN_WISE {val: toInteger(row.A)}]->(c)
merge (b)-[:COLUMN_WISE {val: toInteger(row.B)}]->(c)
merge (d)-[:COLUMN_WISE {val: toInteger(row.D)}]->(c)
)
//case when row.Col1 = k1[4]
FOREACH(ignoreMe IN CASE WHEN row.Col1 = k1[4] THEN [1] ELSE [] END |
merge (a)-[:COLUMN_WISE {val: toInteger(row.A)}]->(d)
merge (b)-[:COLUMN_WISE {val: toInteger(row.B)}]->(d)
merge (c)-[:COLUMN_WISE {val: toInteger(row.C)}]->(d)
)
Set 12 properties, created 12 relationships, completed after 18 ms.
Result:
While this is an interesting way to load data, it isn't standard, and doesn't scale well, since for each additional node involved you would have to add a new column and update all rows. Ideally, when the number of fields being set isn't growing, only the number of nodes needing to be connected or changed, your CSV should only need to change by adding additional rows. The number of columns should remain fixed.
A much easier approach is with a CSV that has a 3 columns: start
, end
, and value
. Then you just build the CSV, each row representing a particular relationships between two nodes, and the value to use.
Your load query becomes much easier. Assuming that your nodes are already in the graph and this is just about creating relationships between them, and assuming we use :Node labels and :REL relationships (since that wasn't provided), and numeric ids for the nodes and numeric values for the values on relationships, a load of the relationships would be :
LOAD CSV WITH HEADERS from 'file:///myNewData.csv' as row
MATCH (start:Node {id:toInteger(row.start)}), (end:Node {id:toInteger(row.end)})
MERGE (start)-[:REL {value:toInteger(row.value)}]->(end)
For a sufficiently large CSV you would want to use USING PERIODIC COMMIT LOAD CSV instead.
Wow thank you. I was afraid it would look as it did and the complexity is going to make me review how I am approaching my data set, as it is much larger.
While this is an interesting way to load data, it isn't standard, and doesn't scale well:
It would be better to provide your model to this problem to prove that my model does not scale up.
My comment about scaling was more about CSV creation and maintenance (my prior answer should have been a reply to the original question, not your proposed solution). I wasn't looking much at your query for handling that strange CSV, I was more baffled about the CSV formatting.
The point I was trying to make is that for an arbitrary number of nodes to connect, you have to keep adding columns to the CSV, and fill out those columns across all rows.
But that's not how a CSV should be built out. If no additional properties are being added or removed for the load, then the addition of new nodes to connect should only require adding new rows to the CSV, not new columns, and not backfilling for new columns either.
I agree with your standard scenarios. If a client wants to use Neo4j in scenarios other than standard scenarios, from business perspective you want to say yes or no.