Neo4j

sboysel · ‎10-13-2019

I am trying to use neo4j-admin import to populate a neo4j database with CSV input data. According to documentation, escaping quotation marks with \" is not supported but my input has these and other formatting anomalies. Hence neo4j-admin import obviously fails for input CSV

> neo4j-admin import --mode=csv --id-type=INTEGER \
>    --high-io=true \
>    --ignore-missing-nodes=true \
>	 --ignore-duplicate-nodes=true \
>    --nodes:user="import/headers_users.csv,import/users.csv"
Neo4j version: 3.5.11
Importing the contents of these files into /data/databases/graph.db:
Nodes:
  :user
  /var/lib/neo4j/import/headers_users.csv
  /var/lib/neo4j/import/users.csv

Available resources:
  Total machine memory: 15.58 GB
  Free machine memory: 598.36 MB
  Max heap memory : 17.78 GB
  Processors: 8
  Configured max memory: -2120992358.00 B
  High-IO: true


IMPORT FAILED in 97ms. 
Data statistics is not available.
Peak memory usage: 0.00 B
Error in input data
Caused by:ERROR in input
  data source: BufferedCharSeeker[source:/var/lib/neo4j/import/users.csv, position:91935, line:866]
  in field: company:string:3
  for header: [user_id:ID(user), login:string, company:string, created_at:string, type:string, fake:string, deleted:string, long:string, lat:string, country_code:string, state:string, city:string, location:string]
  raw field value: yyeshua
  original error: At /var/lib/neo4j/import/users.csv @ position 91935 -  there's a field starting with a quote and whereas it ends that quote there seems to be characters in that field after that ending quote. That isn't supported. This is what I read: 'Universidad Pedagógica Nacional \"F'

My question is whether is it possible to skip or ignore poorly formatted rows of the CSV file for which neo4j-admin import throws an error. No such option seems available in the docs. I understand that solutions exist using LOAD CSV and that CSVs ought to be preprocessed prior to import. Since my use case can tolerate some missing data, I wanted to see if there was an option to ignore bad rows before I begin properly formatting the CSV input.

Neo4j

Can neo4j-admin import skip CSV rows with formatting errors?