cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Bulk Import limitations

So I am trying to import 26 million rows in around 300 csv files (using bash to execute the bulk import code).

I come across a limitation in the number of csv's I can reference before I'm told , that the command is too long...

And some of those csv's have null values in columns.. in the normal load csv , I know how to deal with those, but with the bulk import (neo4j-admin.bat) I do not.

any help would be appreciated

1 ACCEPTED SOLUTION

Oh sorry, I'm always using neo4j-import not neo4j-admin import

View solution in original post

10 REPLIES 10

You can use regular expressions for the files

e.g. --nodes:Person file-[0-9]+.csv.gz

note that in regexp you need to use .* instead of * for "any character"

the null values are skipped during import
if the --ignore-empty-strings setting is set to true:


--ignore-empty-strings <true/false>
        Whether or not empty string fields, i.e. "" from input source are ignored, i.e.
        treated as null. Default value: false

You are a kind man for answering one of my questions again Michael.. 🙂 Thanks.. I'll give it a shot right now. (that helps me having to adjust the data export out of oracle that I'm thought I'd have to labour through).

It's best to try it out with a small subset first and only run the big one after all the kinks have been sorted out.

Saves a lot of waiting time 🙂

Good luck

what if I have different files per node?

ie. --nodes:c_contracts:c_payments ......

and each node is named after a csv file?

Hmmm the problem is that I reach the limitation in bash for characters, when I have 111 tables to import from Oracle.... (using the multiple lines)...

I don't know of such a limit in bash. Did you use teh regexps for the files?

And you can put all the command line options into a file too:

--f <file name>
        File containing all arguments, used as an alternative to supplying all arguments
        on the command line directly.Each argument can be on a separate line or multiple
        arguments per line separated by space.Arguments containing spaces needs to be
        quoted.Supplying other arguments in addition to this file argument is not
        supported.

using the command --ignore-empty-strings=true and i get the message : unrecognized option:'ignore-empty-strings'

... This is also not mentioned in the documentation (which I've taken a look at again as I didn't see before posting the original post ... )

Oh sorry, I'm always using neo4j-import not neo4j-admin import

Ah I see.

Thats a depreciated feature and therefore no longer documentated on neo4j (though in some blogs)

It works on a test example for me , so yeah, that fixes (till you get rid of the neo4j-import , my null value on bulk ) and the import using the file fixes my other issue.

Awesome stuff.