cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Load large json file with new lines as seperator

Bert
Node Clone

I try to load a large (4.5GB) json file into neo4j. This file is in jsonl format, meaning each json object is on its own line. There are about 5.3 million entries.
I read about the apoc.load..() functions but have a few questions:

Do I have to take care of periodic commits?
Can I split the file via apoc.load on the line endings?

Thanks in advance.

1 ACCEPTED SOLUTION

Bert
Node Clone

Just to close this thread, I finally managed to conclude the import and wrote a bit about it: https://faboo.org/2019/02/handelregister-neo4j/

Thanks for the help.

View solution in original post

5 REPLIES 5

Hi Bert,

from my understanding if the json file is essentially a list on top level (and not a map), it is streamed, see https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/3.5/src/main/java/apoc/load/LoadJson.jav....

There is no periodic commit by default, but you can easily do that (untested code below, take care);

"call apoc.load.json(....) yield value return value",
"  create (p:Person) set p = $value // placeholder for your create/merge... statement that operates on every json list elemt - aka every value",
{batchSize: 10000});

Bert
Node Clone

Thanks Stefan,

my problem is that the file is not proper json as a whole, but each line represents a json object. I will try some command line magic to torn this into an json array.

Good to know that are periodic commits.

Bert
Node Clone

Thanks again, import of the german handelsregister is running now. Will take some time as it is over 4GB of json with over 5.000.000 company entries.

Bert
Node Clone

Or not. Import ist OOM me. Not enough heap space, even though I increased it to 8 GB (dbms.memory.heap.max_size) already.

Looks like apoc.load.json($url) is not streaming and tries to load the file upfront.

Bert
Node Clone

Just to close this thread, I finally managed to conclude the import and wrote a bit about it: https://faboo.org/2019/02/handelregister-neo4j/

Thanks for the help.