cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to LOAD CSV from a csv file that has different amount of information for each line

Hello,

I have a csv file that I want to import but each line of the csv file contains data that i dont want to keep (need to be filtered) and length of each line is different (in one column might have two different information but it is still string). The csv file dont have header but it can be added using excel but i dont think its a good idea.

Example csv:
(id, typeA,typeB,typeC) --> header that i add in the csv

But some line, dont have type B, so it became like this
(id, typeA,typeC,1)

some are
(id,typeA,1,typeC,typeD)

Is there a way i can import this kind of csv without modifying the csv?

7 REPLIES 7

Or if you guys know any tools that can sort the csv/excel file based on their number of columns,is very much appreciated.

You can load a CSV that doesn't have a header or even ignore the header by using a SKIP 1. e.g.

load csv from "https://github.com/neo4j-contrib/training/raw/master/modeling/data/aircraft.csv" AS row
WITH row SKIP 1
WHERE size(row) > 1
RETURN row
LIMIT 20

We could then choose to process the lines differently depending on how many items they contain. e.g. maybe if we have 4 items we know that they are id,typeA,typeB,typeC, but if we have 3 items then they are id,typeA,typeC.

Or do you have another way of determining which type each column represents?

I can also determine what a column present by their initals because we have a standard system in naming them such as LDN (London) is Place and BNG (router) is an Equipment. Is there a way we can create a block of name that we can fill with list of initials and classify them based on that ?

For my csv, each line means one connection.

typeA --connected--> typeB --connected--> typeC and Id represents identification of one connection/pathway. Therefore i also need to use MERGE for my csv.

I'm looking for way where we could identify the number of column that is not null, and from that we could the data identically based on the number of column for each row.

Please provide one data line with initials(LDN) and other data columns so that I can suggest a solution.

I didnt quite understand this at first, but this actually what im looking for. So for example if i only want to import data with only 4 column for each line, i need to put the condition "WHERE size(row) < 5" . Therefore, It will skip the line with >=5 column right Is it right ?

Yep exactly! And then you could run the same query again with a different WHERE clause to handle the rows with more columns

sameerG
Graph Buddy

You need to do Data modeling exercise with Target Data Model in Mind and write your own transformation Logic in Cipher Queries.