Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-19-2019 08:28 AM
I received the following error message:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure
apoc.periodic.iterate: Caused by: java.lang.OutOfMemoryError: Java heap space
after running the following cypher
// Data loaded from files downloaded at https://www.fhwa.dot.gov/bridge/nbi/ascii.cfm and stored in the "import" folder for the database instance
LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1S2yMzP30KfjQx2TBE42VjVnH8ZODLVN1lDGwmsPpPJY/export?format=csv&id=1S2yMzP30KfjQx2TBE42VjVnH8ZODLVN1lDGwmsPpPJY&gid=749188439" AS row1
WITH CASE
WHEN NOT row1.Year IS NULL THEN collect(row1.URL)
END AS fileURLs
UNWIND fileURLs as fileURL
CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM $url AS row RETURN row
','
MERGE (state:State {id: row.STATE_CODE_001})
MERGE (state)<-[:OF_STATE]-(county:County {id: row.COUNTY_CODE_003})
MERGE (county)<-[:OF_COUNTY]-(place:Place {id: row.PLACE_CODE_004})
MERGE (place)<-[:OF_PLACE]-(bridge:Bridge {id: row.STATE_CODE_001 + "_" +
row.COUNTY_CODE_003 + "_" +
row.PLACE_CODE_004 + "_" +
row.STRUCTURE_NUMBER_008 +
"_LAT_" + row.LAT_016 +
"_LONG_" +row.LONG_017})
ON CREATE SET bridge.name = row.STRUCTURE_NUMBER_008,
bridge.latitude = row.LAT_016,
bridge.longitude = row.LONG_017,
bridge.yearbuilt = toInteger(row.YEAR_BUILT_027),
place.name = row.PLACE_CODE_004,
county.name = row.COUNTY_CODE_003,
state.name = row.STATE_CODE_001
',
{batchSize:10000, parallel:false, params:{url:fileURL}}) YIELD batches, total
RETURN batches, total
This query loads a set of files based on URLs stored in a shred Google Sheet.
I have run this before and not had any issues. I have the following set in the config file also:
dbms.memory.heap.initial_size=2G
dbms.memory.heap.max_size=2G
dbms.memory.pagecache.size=2G
I recently upgraded to Neo4j Desktop 1.1.21 and have used both Neo4j Browsers 3.2.18 and 3.2.19.
I am able to load most of the data from the files. The Browser says the size of my database is only 1.83 GB and yet I am running out of heap space (set at 2GB above).
Any thoughts of what could be the cause?
04-19-2019 08:44 AM
Also, my Neo4j DB version is "3.5.2". Going to upgrade to "3.5.4" to see if anything changes
04-19-2019 08:58 AM
How big are these CSV files? Total row count?
Any luck if you reduce the batch size?
04-19-2019 09:03 AM
The csv files can be up to 10,000 rows. There about 1400 files. I have run this before after adjusting the memory and heap settings without issue. I am currently diving into the database folders within the Neo4j program folder.
I have also adjusted the batchsize and still the error.
I have sneaking suspicion there may be a data leak somewhere....
04-19-2019 09:12 AM
I am also running this in a VM. I am now checking out the available RAM settings in the VM. Something may have recently changed in there that could be giving me issues.
04-19-2019 09:21 AM
I don't have any knowledge to support this but I wonder if it could be something with the quantity of files. Wondering if it could be a similar issue as when in Apache Spark processing too many small files??
04-19-2019 09:25 AM
That could perhaps be the case. I have recently expanded the number of files included in the import. But I image there are other using more and much larger files than even what I am importing.
I deleted all my nodes and relationships and am re importing right now. I adjusted the heap/memory settings to 1GB to see if the VM is the issue
04-19-2019 09:30 AM
Nope. Same issue....
04-19-2019 11:41 AM
If you're able to start on a fresh DB, you can also check out using the neo4j-admin import
tool. It should be faster than using LOAD_CSV
.
04-19-2019 12:01 PM
I have used that in the past. the issue is that i will keep getting more files to add. Also, I am currently iterating through multiple times (with different cypher statements) to add data. I am still understanding the data I have so I have to clean things up as I go. Not my data. Data from a gov't site I am trying to show
All the sessions of the conference are now available online