cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Question: How do I load in multiple node types from 1 CSV while skipping empty cells? (3.5.3)

Hi! I'm working on a healthcare related project for my capstone and had some questions about loading in large-ish csv files:

My csv will have about ~5,000 lines & ~10 or so columns, where column 0 is a global prop for all nodes, column 1 is label 1, columns 2-3 are props for label 1; column 4 is label 2, columns 5-6 are props for label 2, etc. Since I am not sure how to work with a csv with empty cells, I've just been filling in columns that aren't relevant for certain rows with 0's until now.

I should be able to delete all the excess "bad" nodes after loading in pretty easily, but I'm nervous about the inefficiencies of creating way too many unnecessary nodes for a csv this large. I want to stop making excess nodes with all "0" props, which means I probably should skip the cells that shouldn't be made. Is a foreach (ignoreMe) needed here, and how would one use it for a general context like mine?

Thanks 🙂
~Rachel

4 REPLIES 4

It's usually best to use a separate CSV per label/node type, but for a CSV like this you may want to do multiple passes on the CSV, once per specific node type, and filter based upon whether the row is of the type that you want to load. That way you only set the relevant properties for nodes of the type you're processing.

ameyasoft
Graph Maven

Assuming a sample data of this kind,
Col1 = ABS, Col2 = White, Col3 = Yes, Col4 = , Col5 = Blue, Col6 = No, Col7 = Yellow, Col8 = , Col9 = Unknown

Here Col4 and Col8 are blank. I created a test84.csv file that reflects the above data.

Here is query to import this data:

LOAD CSV WITH HEADERS FROM "file:/test84.csv" As line
WITH line

//Node1 Col1, Col2, Col3 ( no null values)

FOREACH(ignoreMe IN CASE WHEN line.Col1 IS NOT NULL THEN [1] ELSE END |

FOREACH(ignoreMe IN CASE WHEN line.Col2 IS NULL AND line.Col3 IS NULL THEN [1] ELSE [] END | 
	
	MERGE (a:Col1 {prop1: 0, prop2: 0})
)

FOREACH(ignoreMe IN CASE WHEN line.Col2 IS NOT NULL AND line.Col3 IS NOT NULL THEN [1] ELSE [] END | 
	
	MERGE (a:Col1 {prop1: line.Col2, prop2: line.Col3})
)

FOREACH(ignoreMe IN CASE WHEN line.Col2 IS NOT NULL AND line.Col3 IS NULL THEN [1] ELSE [] END | 
	
	MERGE (a:Col1 {prop1: line.Col2, prop2: 0})
)

FOREACH(ignoreMe IN CASE WHEN line.Col2 IS NULL AND line.Col3 IS NOT NULL THEN [1] ELSE [] END | 
	
	MERGE (a:Col1 {prop1: 0, prop2: line.Col3})
)

)

//Node2 Col4, Col5, Col6 (Col4 is null)

FOREACH(ignoreMe IN CASE WHEN line.Col4 IS NOT NULL THEN [1] ELSE END |

FOREACH(ignoreMe IN CASE WHEN line.Col5 IS NULL AND line.Col6 IS NULL THEN [1] ELSE [] END | 
	
	MERGE (b:Col4 {prop1: 0, prop2: 0})
)

FOREACH(ignoreMe IN CASE WHEN line.Col5 IS NOT NULL AND line.Col6 IS NOT NULL THEN [1] ELSE [] END | 
	
	MERGE (b:Col4 {prop1: line.Col5, prop2: line.Col6})
)

FOREACH(ignoreMe IN CASE WHEN line.Col5 IS NOT NULL AND line.Col6 IS NULL THEN [1] ELSE [] END | 
	
	MERGE (b:Col4 {prop1: line.Col5, prop2: 0})
)

FOREACH(ignoreMe IN CASE WHEN line.Col5 IS NULL AND line.Col6 IS NOT NULL THEN [1] ELSE [] END | 
	
	MERGE (b:Col4 {prop1: 0, prop2: line.Col6})
)

)

//Node3 Col7, Col8, Col9 (Col8 is null)

FOREACH(ignoreMe IN CASE WHEN line.Col7 IS NOT NULL THEN [1] ELSE END |

FOREACH(ignoreMe IN CASE WHEN line.Col8 IS NULL AND line.Col9 IS NULL THEN [1] ELSE [] END | 
	
	MERGE (c:Col7 {prop1: 0, prop2: 0})
)

FOREACH(ignoreMe IN CASE WHEN line.Col8 IS NOT NULL AND line.Col9 IS NOT NULL THEN [1] ELSE [] END | 
	
	MERGE (c:Col7 {prop1: line.Col8, prop2: line.Col9})
)

FOREACH(ignoreMe IN CASE WHEN line.Col8 IS NOT NULL AND line.Col9 IS NULL THEN [1] ELSE [] END | 
	
	MERGE (c:Col7 {prop1: line.Col8, prop2: 0})
)

FOREACH(ignoreMe IN CASE WHEN line.Col8 IS NULL AND line.Col9 IS NOT NULL THEN [1] ELSE [] END | 
	
	MERGE (c:Col7 {prop1: 0, prop2: line.Col9})
)

);

Here is result:
2X_3_3b32456bc56497e223dd8869ddf4e06acd3d64bd.png

You have to create a block of FOR EACH for every node you create. That's the only drawback. I had a similar situation and this method worked for me.

Ok Great! Thank you so much, I'll definitely try this out and let you know if I run into any issues 🙂

this was super helpful. appreciate it!

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online