Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-20-2019 05:19 PM
I am suing LOAD CSV
and apoc.periodic.iterate
to import data. I have one .csv
that contains the folder and file names that contain the data to be imported. Below is the code I currently have:
CALL apoc.periodic.iterate("
LOAD CSV WITH HEADERS FROM 'file:///NBIFilesListFiles.csv' AS line1
WITH collect('file:///'+line1.Folder+'/'+line1.File) as Files
UNWIND Files AS file
LOAD CSV WITH HEADERS FROM file AS row RETURN row",
"
MERGE (state:State {id: row.STATE_CODE_001})
MERGE (state)<-[:OF_STATE]-(county:County {id: row.COUNTY_CODE_003})
MERGE (county)<-[:OF_COUNTY]-(place:Place {id: row.PLACE_CODE_004})
MERGE (place)<-[:OF_PLACE]-(bridge:Bridge {id: row.STRUCTURE_NUMBER_008})
ON CREATE SET bridge.name = row.STRUCTURE_NUMBER_008,
bridge.latitude = row.LAT_016,
bridge.longitude = row.LONG_017,
bridge.yearbuilt = row.YEAR_BUILT_027,
bridge.deck_cond = row.DECK_COND_058,
bridge.superstructure_cond = row.SUPERSTRUCTURE_COND_059,
bridge.substructure_cond = row.SUBSTRUCTURE_COND_060,
bridge.channel_cond = row.CHANNEL_COND_061,
bridge.culvert_cond = row.CULVERT_COND_062,
bridge.structural_eval = row.STRUCTURAL_EVAL_067,
bridge.deck_geometry_eval = row.DECK_GEOMETRY_EVAL_068,
bridge.undclrence_eval = row.UNDCLRENCE_EVAL_069,
bridge.posting_eval = row.POSTING_EVAL_070,
bridge.waterway_eval = row.WATERWAY_EVAL_071,
bridge.appr_road_eval = row.APPR_ROAD_EVAL_072,
place.name = row.PLACE_CODE_004,
county.name = row.COUNTY_CODE_003,
state.name = row.STATE_CODE_001
",
{batchSize:1000,iterateList:true})
My goal is to use LOAD CSV
to load the .csv
that contains all the other file names to import and pass that into apoc.periodic.iterate
. Is that possible?
Right now what I have in the code above tends to bog down my machine and cause Neo4j Desktop to crash. I have adjusted these settings to try and help
dbms.memory.heap.initial_size=1G
dbms.memory.heap.max_size=2G
Solved! Go to Solution.
03-14-2019 08:13 AM
SOLUTION: (explanation below)
LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318" AS row1
WITH CASE
WHEN NOT row1.Year IS NULL THEN collect(row1.URL)
END AS fileURLs
UNWIND fileURLs as fileURL
CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM $url AS row RETURN row
//CALL apoc.load.csv($url) YIELD map AS row RETURN row
','
MERGE (state:State {id: row.STATE_CODE_001})
MERGE (state)<-[:OF_STATE]-(county:County {id: row.COUNTY_CODE_003})
MERGE (county)<-[:OF_COUNTY]-(place:Place {id: row.PLACE_CODE_004})
MERGE (place)<-[:OF_PLACE]-(bridge:Bridge {id: row.STRUCTURE_NUMBER_008})
ON CREATE SET bridge.name = row.STRUCTURE_NUMBER_008,
bridge.latitude = row.LAT_016,
bridge.longitude = row.LONG_017,
bridge.yearbuilt = row.YEAR_BUILT_027,
bridge.deck_cond = row.DECK_COND_058,
bridge.superstructure_cond = row.SUPERSTRUCTURE_COND_059,
bridge.substructure_cond = row.SUBSTRUCTURE_COND_060,
bridge.channel_cond = row.CHANNEL_COND_061,
bridge.culvert_cond = row.CULVERT_COND_062,
bridge.structural_eval = row.STRUCTURAL_EVAL_067,
bridge.deck_geometry_eval = row.DECK_GEOMETRY_EVAL_068,
bridge.undclrence_eval = row.UNDCLRENCE_EVAL_069,
bridge.posting_eval = row.POSTING_EVAL_070,
bridge.waterway_eval = row.WATERWAY_EVAL_071,
bridge.appr_road_eval = row.APPR_ROAD_EVAL_072,
place.name = row.PLACE_CODE_004,
county.name = row.COUNTY_CODE_003,
state.name = row.STATE_CODE_001
',
{batchSize:1000, parallel:false, params:{url:fileURL}}) YIELD batches, total
RETURN batches, total
You will notice that //CALL apoc.load.csv($url) YIELD map AS row RETURN row
is commented out. While that is supposed to work. One of the files at the URL I was calling was throwing an error, so at this point I had to use LOAD CSV
instead.
EXPLANATION
The first part of the code
LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318" AS row1
WITH CASE
WHEN NOT row1.Year IS NULL THEN collect(row1.URL)
END AS fileURLs
UNWIND fileURLs as fileURL
loads the file URLs from the shared Google Sheet. The WITH
clause is included to "ignore" rows where the file URL has not been verified. All that satisfy are put into a collection and then using the UNWIND
passed in to the next batch of code contained within the apoc.periodic.iterate
function. The code within apoc.periodic.iterate
load the data for each url and add it to the graph using MERGE
.
02-20-2019 05:48 PM
If I recall we don't recommend using LOAD CSV within apoc.periodic.iterate(), but you can use apoc.load.csv()
02-21-2019 07:22 AM
@andrew.bowman so the recommendation is to use apoc.load.csv()
inside of apoc.periodic.iterate()
?
02-21-2019 06:56 PM
@andrew.bowman I somehow skipped over that in the APOC library. I have taken that and applied it as follows (I simplified the MERGE
statements in my original post):
CALL apoc.periodic.iterate(
'
CALL apoc.load.csv({"https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318"}) yield map as row return row
','
WITH row.URL as url
CALL apoc.periodic.iterate(
'
CALL apoc.load.csv({url}) yield map as row1 RETURN row1
','
MERGE (state:State {id: row1.STATE_CODE_001})
ON CREATE SET state.name = row1.STATE_CODE_001
',
{batchSize:1000, iterateList:true})
',
{batchSize:10, iterateList:true})
I am getting an error on the line CALL apoc.load.csv({url}) yield map as row1 RETURN row1
that reads
"Neo.ClientError.Statement.SyntaxError: Invalid input 'A': expected 'o/O' (line 8, column 2 (offset: 302))
"CALL apoc.load.csv({url}) yield map as row1 RETURN row1"
03-14-2019 08:13 AM
SOLUTION: (explanation below)
LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318" AS row1
WITH CASE
WHEN NOT row1.Year IS NULL THEN collect(row1.URL)
END AS fileURLs
UNWIND fileURLs as fileURL
CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM $url AS row RETURN row
//CALL apoc.load.csv($url) YIELD map AS row RETURN row
','
MERGE (state:State {id: row.STATE_CODE_001})
MERGE (state)<-[:OF_STATE]-(county:County {id: row.COUNTY_CODE_003})
MERGE (county)<-[:OF_COUNTY]-(place:Place {id: row.PLACE_CODE_004})
MERGE (place)<-[:OF_PLACE]-(bridge:Bridge {id: row.STRUCTURE_NUMBER_008})
ON CREATE SET bridge.name = row.STRUCTURE_NUMBER_008,
bridge.latitude = row.LAT_016,
bridge.longitude = row.LONG_017,
bridge.yearbuilt = row.YEAR_BUILT_027,
bridge.deck_cond = row.DECK_COND_058,
bridge.superstructure_cond = row.SUPERSTRUCTURE_COND_059,
bridge.substructure_cond = row.SUBSTRUCTURE_COND_060,
bridge.channel_cond = row.CHANNEL_COND_061,
bridge.culvert_cond = row.CULVERT_COND_062,
bridge.structural_eval = row.STRUCTURAL_EVAL_067,
bridge.deck_geometry_eval = row.DECK_GEOMETRY_EVAL_068,
bridge.undclrence_eval = row.UNDCLRENCE_EVAL_069,
bridge.posting_eval = row.POSTING_EVAL_070,
bridge.waterway_eval = row.WATERWAY_EVAL_071,
bridge.appr_road_eval = row.APPR_ROAD_EVAL_072,
place.name = row.PLACE_CODE_004,
county.name = row.COUNTY_CODE_003,
state.name = row.STATE_CODE_001
',
{batchSize:1000, parallel:false, params:{url:fileURL}}) YIELD batches, total
RETURN batches, total
You will notice that //CALL apoc.load.csv($url) YIELD map AS row RETURN row
is commented out. While that is supposed to work. One of the files at the URL I was calling was throwing an error, so at this point I had to use LOAD CSV
instead.
EXPLANATION
The first part of the code
LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318" AS row1
WITH CASE
WHEN NOT row1.Year IS NULL THEN collect(row1.URL)
END AS fileURLs
UNWIND fileURLs as fileURL
loads the file URLs from the shared Google Sheet. The WITH
clause is included to "ignore" rows where the file URL has not been verified. All that satisfy are put into a collection and then using the UNWIND
passed in to the next batch of code contained within the apoc.periodic.iterate
function. The code within apoc.periodic.iterate
load the data for each url and add it to the graph using MERGE
.
08-12-2019 07:15 AM
Just following up with some additional information about the websocket error I was encountering.
In several of the files there are internal '
and "
within column entries. Using CALL apoc.load.csv(url,{header:true, quoteChar: "\u0000"})
, in lieu of LOAD CSV WITH HEADERS
, where \u0000
is a null
character I was able to go through all the file rows and create nodes.
@jsmccrumb and @mdfrenchman helped me with figuring out how to use \u0000
to get around this issue.
All the sessions of the conference are now available online