Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-09-2019 08:33 AM
I try to load a large (4.5GB) json file into neo4j. This file is in jsonl format, meaning each json object is on its own line. There are about 5.3 million entries.
I read about the apoc.load..() functions but have a few questions:
Do I have to take care of periodic commits?
Can I split the file via apoc.load on the line endings?
Thanks in advance.
Solved! Go to Solution.
02-15-2019 03:58 PM
Just to close this thread, I finally managed to conclude the import and wrote a bit about it: https://faboo.org/2019/02/handelregister-neo4j/
Thanks for the help.
02-09-2019 10:26 AM
Hi Bert,
from my understanding if the json file is essentially a list on top level (and not a map), it is streamed, see https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/3.5/src/main/java/apoc/load/LoadJson.jav....
There is no periodic commit by default, but you can easily do that (untested code below, take care);
"call apoc.load.json(....) yield value return value",
" create (p:Person) set p = $value // placeholder for your create/merge... statement that operates on every json list elemt - aka every value",
{batchSize: 10000});
02-10-2019 01:01 AM
Thanks Stefan,
my problem is that the file is not proper json as a whole, but each line represents a json object. I will try some command line magic to torn this into an json array.
Good to know that are periodic commits.
02-10-2019 12:40 PM
Thanks again, import of the german handelsregister is running now. Will take some time as it is over 4GB of json with over 5.000.000 company entries.
02-10-2019 02:26 PM
Or not. Import ist OOM me. Not enough heap space, even though I increased it to 8 GB (dbms.memory.heap.max_size) already.
Looks like apoc.load.json($url) is not streaming and tries to load the file upfront.
02-15-2019 03:58 PM
Just to close this thread, I finally managed to conclude the import and wrote a bit about it: https://faboo.org/2019/02/handelregister-neo4j/
Thanks for the help.
All the sessions of the conference are now available online