cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `apoc.periodic.iterate`: Caused by: java.lang.OutOfMemoryError: Java heap space

I received the following error message:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedureapoc.periodic.iterate: Caused by: java.lang.OutOfMemoryError: Java heap space
after running the following cypher

// Data loaded from files downloaded at https://www.fhwa.dot.gov/bridge/nbi/ascii.cfm and stored in the "import" folder for the database instance
LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1S2yMzP30KfjQx2TBE42VjVnH8ZODLVN1lDGwmsPpPJY/export?format=csv&id=1S2yMzP30KfjQx2TBE42VjVnH8ZODLVN1lDGwmsPpPJY&gid=749188439" AS row1
WITH CASE
	WHEN NOT row1.Year IS NULL THEN collect(row1.URL)
    END AS fileURLs
UNWIND fileURLs as fileURL
CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM $url AS row RETURN row

','
MERGE (state:State {id: row.STATE_CODE_001})
MERGE (state)<-[:OF_STATE]-(county:County {id: row.COUNTY_CODE_003})
MERGE (county)<-[:OF_COUNTY]-(place:Place {id: row.PLACE_CODE_004})
MERGE (place)<-[:OF_PLACE]-(bridge:Bridge {id: row.STATE_CODE_001 + "_" + 
                                               row.COUNTY_CODE_003 + "_" + 
                                               row.PLACE_CODE_004 + "_" + 
                                               row.STRUCTURE_NUMBER_008 + 
                                               "_LAT_" + row.LAT_016 + 
                                               "_LONG_" +row.LONG_017})
ON CREATE SET bridge.name = row.STRUCTURE_NUMBER_008,
			  bridge.latitude = row.LAT_016,
			  bridge.longitude = row.LONG_017,
			  bridge.yearbuilt = toInteger(row.YEAR_BUILT_027),
			  
			  place.name = row.PLACE_CODE_004,
			  county.name = row.COUNTY_CODE_003,
			  state.name = row.STATE_CODE_001
',
{batchSize:10000, parallel:false, params:{url:fileURL}}) YIELD batches, total
RETURN batches, total

This query loads a set of files based on URLs stored in a shred Google Sheet.

I have run this before and not had any issues. I have the following set in the config file also:

dbms.memory.heap.initial_size=2G
dbms.memory.heap.max_size=2G
dbms.memory.pagecache.size=2G

I recently upgraded to Neo4j Desktop 1.1.21 and have used both Neo4j Browsers 3.2.18 and 3.2.19.

I am able to load most of the data from the files. The Browser says the size of my database is only 1.83 GB and yet I am running out of heap space (set at 2GB above).

Any thoughts of what could be the cause?

9 REPLIES 9

Also, my Neo4j DB version is "3.5.2". Going to upgrade to "3.5.4" to see if anything changes

How big are these CSV files? Total row count?

Any luck if you reduce the batch size?

The csv files can be up to 10,000 rows. There about 1400 files. I have run this before after adjusting the memory and heap settings without issue. I am currently diving into the database folders within the Neo4j program folder.

I have also adjusted the batchsize and still the error.

I have sneaking suspicion there may be a data leak somewhere....

I am also running this in a VM. I am now checking out the available RAM settings in the VM. Something may have recently changed in there that could be giving me issues.

I don't have any knowledge to support this but I wonder if it could be something with the quantity of files. Wondering if it could be a similar issue as when in Apache Spark processing too many small files??

That could perhaps be the case. I have recently expanded the number of files included in the import. But I image there are other using more and much larger files than even what I am importing.

I deleted all my nodes and relationships and am re importing right now. I adjusted the heap/memory settings to 1GB to see if the VM is the issue

Nope. Same issue....

If you're able to start on a fresh DB, you can also check out using the neo4j-admin import tool. It should be faster than using LOAD_CSV.

I have used that in the past. the issue is that i will keep getting more files to add. Also, I am currently iterating through multiple times (with different cypher statements) to add data. I am still understanding the data I have so I have to clean things up as I go. Not my data. Data from a gov't site I am trying to show

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online