Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-15-2021 04:43 PM
I want to import a large json dataset using apoc library, and create nodes - relationships. My script was working slowly (~750000 records). Neo4j Community suggested to batch the records for more efficiency. Firstly, what method should i use ? Periodic Commit or Periodic Iterate? For example when i'm using commit, it executes, without doing anything.
// Insert CPEs and CPEs Children - Cypher Script
UNWIND ["nvdcpematch-1.0.json"] AS files
CALL apoc.periodic.commit("
CALL apoc.load.json($files) YIELD value
// Insert Base Platform
UNWIND value.matches AS value_cpe
with value_cpe limit $limit
MERGE (cpe:CPE {
uri: value_cpe.cpe23Uri
})
// Insert Children
FOREACH (value_child IN value_cpe.cpe_name |
MERGE (child:CPE {
uri: value_child.cpe23Uri
})
MERGE (cpe)-[:parentOf]->(child)
)", {parallel:true, files:files, limit:2000}
) YIELD UPDATES RETURN UPDATES
10-15-2021 06:27 PM
Hi, @damisg7 !
You should try with apoc.periodic.iterate()
. It is used to load your data in transactional batches and in parallel. By using it, the heap memory will be released in every batch and the load time will be faster.
An example of usage is:
CALL apoc.periodic.iterate(
'CALL apoc.load.jdbc("jdbc:mysql://localhost:3306/northwind?user=root","company")',
'CREATE (p:Person) SET p += value',
{ batchSize:10000, parallel:true})
RETURN batches, total
10-16-2021 04:11 PM
I tried out several combinations of this, but always getting to running out of memory exception. I tried different batch sizes, different ways to split the query etc. However, I think that this way is the right one!
// Insert CPEs and CPEs Children - Cypher Script
UNWIND ["nvdcpematch-1.0.json"] AS files
CALL apoc.periodic.iterate("
CALL apoc.load.json($files) YIELD value",
"// Insert Base Platform
UNWIND value.matches AS value_cpe
MERGE (cpe:CPE {
uri: value_cpe.cpe23Uri
})
// Insert Children
FOREACH (value_child IN value_cpe.cpe_name |
MERGE (child:CPE {
uri: value_child.cpe23Uri
})
MERGE (cpe)-[:parentOf]->(child)
)", {parallel:true, batchSize:10000, params:{files:files}}
) YIELD batches, total RETURN batches, total
The exception i get is the one below. I have 8GB RAM and dbms.memory.heap.initial_size=1.5G,
dbms.memory.heap.max_size=3G, dbms.memory.pagecache.size=1.5G.
Failed to invoke procedure `apoc.periodic.iterate`: Caused by: java.lang.OutOfMemoryError: Java heap space
10-16-2021 03:01 AM
Hello @damisg7
I assume that the property uri
is unique so did you create a UNIQUE CONSTRAINT on this property before?
Normally, your data will load faster with APOC and UNIQUE CONSTRAINT.
Regards,
Cobra
10-16-2021 06:25 AM
I already have it thanks!
All the sessions of the conference are now available online