cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Importing pipe-delimited CSV fails due to embedded commas?

I am trying to import a CSV file (using neo4j-admin import) that has a header row and uses pipe characters as delimiters. Some of the rows have commas in the fields (e.g., in an address field), which you would think would be irrelevant when the field delimiter is a pipe.

However, I'm seeing an error that makes me think that the import tool is either not considering pipes as delimiters, or maybe considering commas as delimiters in addition to the pipes.

I executed the following command:

neo4j-admin import --delimiter "|" --f scripts/load3.txt

and scripts/load3.txt contains:

--nodes:Type1 data/type1.csv

Here are the first 2 rows of data/type1.csv:

field1:string|field2:string|field3:string|field4:string|field5:string|field6:string|field7:string|field8:string|field9:string|field10:string|field11:int|field12:int|field13:string|field14:string|field15:string|field16:string|field17:ID(Type1-ID)|field18:string|field19:datetime|field20:string|field21:string|field22:string|field23:string|field24:string|field25:string|field26:string|field27:date|field28:string|field29:string|field30:string|field31:string|field32:string|field33:string|field34:string|field35:string|field36:string|field37:string|field38:string|field39:date|field40:string|field41:string
NV|TI|E2|OZ|N9JFNAWYZC7NISXE0C|Lanesboro|LC8O2AWVQB|""|VBYLX24ESGRT55KZ7N|""|26677|319|Stefany|82977|""|""|4208453280712768998|""|""|""|0259256186|""|7|0726014721|51451|8|2019-07-18|CA|A29FFO4YFVA20SKTRW|KUFFPM|W|""|1 655 964 2676|""|W7020CXL08|LCFFHU4RBZ6JSI5|1 057 256 5644|23|2019-07-18|597 CASTORO Canyon, Suite 3157, Boston, Virginia, 22713|""

This is the error message:

org.neo4j.unsafe.impl.batchimport.input.InputException: ERROR in input
  data source: BufferedCharSeeker[source:/Users/xyz/data/type1.csv, position:321, line:0]
  in field: field1:string|field2:string|field3:string|field4:string|field5:string|field6:string|field7:string|field8:string|field9:string|field10:string|field11:int|field12:int|field13:string|field14:string|field15:string|field16:string|field17:ID(Type1-ID)|field18:string|field19:datetime|field20:string|field21:string|field22:string|field23:string|field24:string|field25:string|field26:string|field27:date|field28:string|field29:string|field30:string|field31:string|field32:string|field33:string|field34:string|field35:string|field36:string|field37:string|field38:string|field39:date|field40:string|field41:string:2
  for header: [field1:string|field2:string|field3:string|field4:string|field5:string|field6:string|field7:string|field8:string|field9:string|field10:string|field11:int|field12:int|field13:string|field14:string|field15:string|field16:string|field17:ID(Type1-ID)|field18:string|field19:datetime|field20:string|field21:string|field22:string|field23:string|field24:string|field25:string|field26:string|field27:date|field28:string|field29:string|field30:string|field31:string|field32:string|field33:string|field34:string|field35:string|field36:string|field37:string|field38:string|field39:date|field40:string|field41:string]
  raw field value: Suite 3157
  original error: Extra column not present in header on line 1 in /Users/xyz/data/type1.csv with value Suite 3157
	at org.neo4j.unsafe.impl.batchimport.input.BadCollector$ExtraColumnsProblemReporter.exception(BadCollector.java:306)
	at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collect(BadCollector.java:168)
	at org.neo4j.unsafe.impl.batchimport.input.BadCollector.collectExtraColumns(BadCollector.java:129)
	at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputParser.next(CsvInputParser.java:198)
	at org.neo4j.unsafe.impl.batchimport.input.csv.LazyCsvInputChunk.next(LazyCsvInputChunk.java:96)
	at org.neo4j.unsafe.impl.batchimport.input.csv.CsvInputChunkProxy.next(CsvInputChunkProxy.java:75)
	at org.neo4j.unsafe.impl.batchimport.ExhaustingEntityImporterRunnable.run(ExhaustingEntityImporterRunnable.java:57)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:122)

Looks to me like it split the record on the pipes up to the point of the address, then it split on the commas in the address.

Help me understand how to use neo4j-admin import correctly...

Thanks

Dave

1 REPLY 1

Running this directly from command line just worked for me:

bin/neo4j-admin import --delimiter "|" --nodes:Type1 import/data1.csv

Have you tried to move --delimited "|" to load3.txt. My suspicion is if you use --f all other options are ignored.