Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-05-2019 01:45 PM
Hi,
I have been dealing with a websocket error as described in this post this post.
Below is the query I have been trying to run when this error has been occuring:
MATCH (file:File)
WHERE NOT (file)-[:CONTAINS]->(:Row)
WITH file, collect(file.url) AS fileURLs
UNWIND fileURLs AS fileURL
CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM $url AS row
RETURN row
','
CREATE (fileRow:Row {fileURl: $url,
createdOn: date(),
STATE_CODE_001: row.STATE_CODE_001,
HIGHWAY_DISTRICT_002: row.HIGHWAY_DISTRICT_002,
COUNTY_CODE_003: row.COUNTY_CODE_003,
PLACE_CODE_004: row.PLACE_CODE_004,
RECORD_TYPE_005A: row.RECORD_TYPE_005A,
ROUTE_PREFIX_005B: row.ROUTE_PREFIX_005B,
SERVICE_LEVEL_005C: row.SERVICE_LEVEL_005C,
ROUTE_NUMBER_005D: row.ROUTE_NUMBER_005D,
DIRECTION_005E: row.DIRECTION_005E,
FEATURES_DESC_006A: row.FEATURES_DESC_006A,
CRITICAL_FACILITY_006B: row.CRITICAL_FACILITY_006B,
FACILITY_CARRIED_007: row.FACILITY_CARRIED_007,
STRUCTURE_NUMBER_008: row.STRUCTURE_NUMBER_008,
LOCATION_009: row.LOCATION_009,
MIN_VERT_CLR_010: row.MIN_VERT_CLR_010,
KILOPOINT_011: row.KILOPOINT_011,
BASE_HWY_NETWORK_012: row.BASE_HWY_NETWORK_012,
LRS_INV_ROUTE_013A: row.LRS_INV_ROUTE_013A,
SUBROUTE_NO_013B: row.SUBROUTE_NO_013B,
LAT_016: row.LAT_016,
LONG_017: row.LONG_017,
DETOUR_KILOS_019: row.DETOUR_KILOS_019,
TOLL_020 : row.TOLL_020,
MAINTENANCE_021: row.MAINTENANCE_021,
OWNER_022: row.OWNER_022,
FUNCTIONAL_CLASS_026: row.FUNCTIONAL_CLASS_026,
YEAR_BUILT_027: row.YEAR_BUILT_027,
TRAFFIC_LANES_ON_028A: row.TRAFFIC_LANES_ON_028A,
TRAFFIC_LANES_UND_028B: row.TRAFFIC_LANES_UND_028B,
ADT_029: row.ADT_029,
YEAR_ADT_030: row.YEAR_ADT_030,
DESIGN_LOAD_031: row.DESIGN_LOAD_031,
APPR_WIDTH_MT_032: row.APPR_WIDTH_MT_032,
MEDIAN_CODE_033: row.MEDIAN_CODE_033,
DEGREES_SKEW_034: row.DEGREES_SKEW_034,
STRUCTURE_FLARED_035: row.STRUCTURE_FLARED_035,
RAILINGS_036A: row.RAILINGS_036A,
TRANSITIONS_036B: row.TRANSITIONS_036B,
APPR_RAIL_036C: row.APPR_RAIL_036C,
APPR_RAIL_END_036D: row.APPR_RAIL_END_036D,
HISTORY_037: row.HISTORY_037,
NAVIGATION_038: row.NAVIGATION_038,
NAV_VERT_CLR_MT_039: row.NAV_VERT_CLR_MT_039,
NAV_HORR_CLR_MT_040: row.NAV_HORR_CLR_MT_040,
OPEN_CLOSED_POSTED_041: row.OPEN_CLOSED_POSTED_041,
SERVICE_ON_042A: row.SERVICE_ON_042A,
SERVICE_UND_042B: row.SERVICE_UND_042B,
STRUCTURE_KIND_043A: row.STRUCTURE_KIND_043A,
STRUCTURE_TYPE_043B: row.STRUCTURE_TYPE_043B,
APPR_KIND_044A: row.APPR_KIND_044A,
APPR_TYPE_044B: row.APPR_TYPE_044B,
MAIN_UNIT_SPANS_045: row.MAIN_UNIT_SPANS_045,
APPR_SPANS_046: row.APPR_SPANS_046,
HORR_CLR_MT_047: row.HORR_CLR_MT_047,
MAX_SPAN_LEN_MT_048: row.MAX_SPAN_LEN_MT_048,
STRUCTURE_LEN_MT_049: row.STRUCTURE_LEN_MT_049,
LEFT_CURB_MT_050A: row.LEFT_CURB_MT_050A,
RIGHT_CURB_MT_050B: row.RIGHT_CURB_MT_050B,
ROADWAY_WIDTH_MT_051: row.ROADWAY_WIDTH_MT_051,
DECK_WIDTH_MT_052: row.DECK_WIDTH_MT_052,
VERT_CLR_OVER_MT_053: row.VERT_CLR_OVER_MT_053,
VERT_CLR_UND_REF_054A: row.VERT_CLR_UND_REF_054A,
VERT_CLR_UND_054B: row.VERT_CLR_UND_054B,
LAT_UND_REF_055A: row.LAT_UND_REF_055A,
LAT_UND_MT_055B: row.LAT_UND_MT_055B,
LEFT_LAT_UND_MT_056: row.LEFT_LAT_UND_MT_056,
DECK_COND_058: row.DECK_COND_058,
SUPERSTRUCTURE_COND_059: row.SUPERSTRUCTURE_COND_059,
SUBSTRUCTURE_COND_060: row.SUBSTRUCTURE_COND_060,
CHANNEL_COND_061: row.CHANNEL_COND_061,
CULVERT_COND_062: row.CULVERT_COND_062,
OPR_RATING_METH_063: row.OPR_RATING_METH_063,
OPERATING_RATING_064: row.OPERATING_RATING_064,
INV_RATING_METH_065: row.INV_RATING_METH_065,
INVENTORY_RATING_066: row.INVENTORY_RATING_066,
STRUCTURAL_EVAL_067: row.STRUCTURAL_EVAL_067,
DECK_GEOMETRY_EVAL_068: row.DECK_GEOMETRY_EVAL_068,
UNDCLRENCE_EVAL_069: row.UNDCLRENCE_EVAL_069,
POSTING_EVAL_070: row.POSTING_EVAL_070,
WATERWAY_EVAL_071: row.WATERWAY_EVAL_071,
APPR_ROAD_EVAL_072: row.APPR_ROAD_EVAL_072,
WORK_PROPOSED_075A: row.WORK_PROPOSED_075A,
WORK_DONE_BY_075B: row.WORK_DONE_BY_075B,
IMP_LEN_MT_076: row.IMP_LEN_MT_076,
DATE_OF_INSPECT_090: row.DATE_OF_INSPECT_090,
INSPECT_FREQ_MONTHS_091: row.INSPECT_FREQ_MONTHS_091,
FRACTURE_092A: row.FRACTURE_092A,
UNDWATER_LOOK_SEE_092B: row.UNDWATER_LOOK_SEE_092B,
SPEC_INSPECT_092C: row.SPEC_INSPECT_092C,
FRACTURE_LAST_DATE_093A: row.FRACTURE_LAST_DATE_093A,
UNDWATER_LAST_DATE_093B: row.UNDWATER_LAST_DATE_093B,
SPEC_LAST_DATE_093C: row.SPEC_LAST_DATE_093C,
BRIDGE_IMP_COST_094: row.BRIDGE_IMP_COST_094,
ROADWAY_IMP_COST_095: row.ROADWAY_IMP_COST_095,
TOTAL_IMP_COST_096: row.TOTAL_IMP_COST_096,
YEAR_OF_IMP_097: row.YEAR_OF_IMP_097,
OTHER_STATE_CODE_098A: row.OTHER_STATE_CODE_098A,
OTHER_STATE_PCNT_098B: row.OTHER_STATE_PCNT_098B,
OTHR_STATE_STRUC_NO_099: row.OTHR_STATE_STRUC_NO_099,
STRAHNET_HIGHWAY_100: row.STRAHNET_HIGHWAY_100,
PARALLEL_STRUCTURE_101: row.PARALLEL_STRUCTURE_101,
TRAFFIC_DIRECTION_102: row.TRAFFIC_DIRECTION_102,
TEMP_STRUCTURE_103: row.TEMP_STRUCTURE_103,
HIGHWAY_SYSTEM_104: row.HIGHWAY_SYSTEM_104,
FEDERAL_LANDS_105: row.FEDERAL_LANDS_105,
YEAR_RECONSTRUCTED_106: row.YEAR_RECONSTRUCTED_106,
DECK_STRUCTURE_TYPE_107: row.DECK_STRUCTURE_TYPE_107,
SURFACE_TYPE_108A: row.SURFACE_TYPE_108A,
MEMBRANE_TYPE_108B: row.MEMBRANE_TYPE_108B,
DECK_PROTECTION_108C: row.DECK_PROTECTION_108C,
PERCENT_ADT_TRUCK_109: row.PERCENT_ADT_TRUCK_109,
NATIONAL_NETWORK_110: row.NATIONAL_NETWORK_110,
PIER_PROTECTION_111: row.PIER_PROTECTION_111,
BRIDGE_LEN_IND_112: row.BRIDGE_LEN_IND_112,
SCOUR_CRITICAL_113: row.SCOUR_CRITICAL_113,
FUTURE_ADT_114: row.FUTURE_ADT_114,
YEAR_OF_FUTURE_ADT_115: row.YEAR_OF_FUTURE_ADT_115,
MIN_NAV_CLR_MT_116: row.MIN_NAV_CLR_MT_116,
FED_AGENCY: row.FED_AGENCY,
DATE_LAST_UPDATE: row.DATE_LAST_UPDATE,
TYPE_LAST_UPDATE: row.TYPE_LAST_UPDATE,
DEDUCT_CODE: row.DEDUCT_CODE,
REMARKS: row.REMARKS,
PROGRAM_CODE: row.PROGRAM_CODE,
PROJ_NO: row.PROJ_NO,
PROJ_SUFFIX: row.PROJ_SUFFIX,
NBI_TYPE_OF_IMP: row.NBI_TYPE_OF_IMP,
DTL_TYPE_OF_IMP: row.DTL_TYPE_OF_IMP,
SPECIAL_CODE: row.SPECIAL_CODE,
STEP_CODE: row.STEP_CODE,
STATUS_WITH_10YR_RULE: row.STATUS_WITH_10YR_RULE,
SUFFICIENCY_ASTERC: row.SUFFICIENCY_ASTERC,
SUFFICIENCY_RATING: row.SUFFICIENCY_RATING,
STATUS_NO_10YR_RULE: row.STATUS_NO_10YR_RULE})
',
{batchSize:1000,parallel:false,params:{url:fileURL}}) YIELD batches, total
RETURN batches, total
I was using Halin to monitor the Heap Size as seen in the following image:
It appears as though apoc.periodic.iterate
is holding onto memory and possibly leading to the websocket error. I have tried this query with a variety of configurations (batch size, RAM allocations, etc.) and it keeps happening. The Heap always approaches the limit set in the config settings no matter how small of a batch size I use or how much memory I allocate to it.
Can someone explain how apoc.period.iterate
uses memory? Is there a better approach I can use or it an issue with apoc.periodic.iterate
?
Any insight would be much appreciated.
Solved! Go to Solution.
08-12-2019 07:11 AM
In addition to adding {header:true}
to the apoc.load.csv
function, setting quoteChar: "\u0000"
appears to have rectified my websocket errors. Big thanks to help from @mdfrenchman and @jsmccrumb to get this figured out!
08-05-2019 02:15 PM
There's a couple worrying things here.
When you do WITH file, collect(file.url)
you're collecting, per file on a row, that file's single url
property. So per row you're creating 1-element collections then UNWINDing them. That doesn't seem like it's what you want to do. Maybe you wanted to just collect all file URLs and UNWIND them?
apoc.periodic.iterate()
doesn't work well with LOAD CSV, we'd suggest not using that approach. Maybe try using apoc.load.csv()
instead.
08-05-2019 05:40 PM
@andrew.bowman You are correct in that the inclusion of file
with collect()
is definitely not intentional. Thank you for pointing that out. I have been using LOAD CSV
in lieu of apoc.load.csv
because I have had issues getting them to work together.
08-05-2019 05:49 PM
I tried the following query based on the apoc documentation
MATCH (file:File)
WHERE NOT (file)-[:CONTAINS]->(:Row)
WITH collect(file.url) AS fileURLs
UNWIND fileURLs AS fileURL
CALL apoc.periodic.iterate(
'
CALL apoc.load.csv($url) yield map as row
RETURN row
','
CREATE (fileRow:Row) SET p = row
',
{batchSize:5000,parallel:false,params:{url:fileURL}}) YIELD batches, total
RETURN batches, total
which lead me to this error message:
You have communicated with me before on using the apoc.load.csv
with apoc.periodic.iterate
here.
08-10-2019 11:35 AM
@andrew.bowman I realized that my ArrayIndexOutOfBoundsException was caused by failing to include the parameter for headers. I changed CALL apoc.load.csv(url)
to CALL apoc.load.csv(url,{header:true})
which corrected that issue. PALM TO FACE
I am still getting a websocket error from some of the files and will be digging into those.
08-12-2019 07:11 AM
In addition to adding {header:true}
to the apoc.load.csv
function, setting quoteChar: "\u0000"
appears to have rectified my websocket errors. Big thanks to help from @mdfrenchman and @jsmccrumb to get this figured out!
All the sessions of the conference are now available online