Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-12-2022 12:34 PM
I'm suddenly seeing EXTREMELY slow times for LOAD CSV queries that have been fine and fast for months. I'm using neo4j-enterprise v4.4.6 on an AWS EC2 instance running a current Rocky Linux system.
According to "SHOW INDEXES", the database in question has an index on all relevant properties and labels.
Here is the query:
LOAD CSV WITH HEADERS FROM 'file://my/local/file.csv' AS row
CALL {
WITH row
WITH
row WHERE row.county_fips <> row.neighbor_fips
WITH
row,
row.county_fips + '000000000000000' AS fipsIDCounty,
row.neighbor_fips + '000000000000000' AS fipsIDNeighbor
MATCH (countyFeature:FIPSItem { fipsID: fipsIDCounty})
MATCH (neighbor:FIPSItem { fipsID: fipsIDNeighbor})
WITH
countyFeature,
neighbor
MERGE (countyFeature)-[:INFECTS]->(neighbor)
} IN TRANSACTIONS OF 1000 ROWS
The use of CALL {} IN TRANSACTIONS OF 10000 ROWS
is a recent step in my vain attempt to make this work at all.
Earlier this month (4-May-2022), the following earlier version of the same query finished in under 2 minutes:
LOAD CSV WITH HEADERS FROM 'file://my/local/file.csv' AS row
MATCH (fipsItem:FIPSItem { fipsID: row.county_fips + '000000000000000'})
MATCH (neighbor:FIPSItem { fipsID: row.neighbor_fips + '000000000000000'})
WHERE row.county_fips <> row.neighbor_fips
MERGE (fipsItem)-[r:INFECTS]-(neighbor)
The file being loaded has 16,892 records and has not changed.
I notice that three "INFO" entries like the following have been added to debug.log
while the most recent version is running:
2022-05-12 19:13:51.996+0000 INFO [o.n.c.i.ExecutionEngine] [tms-covid-b/11303b5a] Discarded stale query from the query cache after 102 seconds. Reason: CardinalityByLabelsAndRelationshipType(Some(LabelId(0)),Some(RelTypeId(9)),Some(LabelId(0))) changed from 4700.0 to 1.0, which is a divergence of 0.9997872340425532 which is greater than threshold 0.7325330418736514. Query id: 468
I have no clue what this means or if it's relevant. I see no other log entries of interest.
I've configured the instance with memrec, and that configuration hasn't changed. There is lots of memory, lots of disk, and lots of CPU.
This query that used to reliably finish in under 2 minutes is now adding 1,000 bindings each minute. That's at least an order of magnitude slower.
I invite guidance about what is going on.
05-12-2022 05:08 PM
Try this:
LOAD CSV WITH HEADERS FROM 'file://my/local/file.csv' AS row
WITH
row WHERE row.county_fips <> row.neighbor_fips
CALL {
WITH
row,
row.county_fips + '000000000000000' AS fipsIDCounty,
row.neighbor_fips + '000000000000000' AS fipsIDNeighbor
MATCH (neighbor:FIPSItem { fipsID: fipsIDNeighbor})
MATCH (countyFeature:FIPSItem { fipsID: fipsIDCounty})
WITH
countyFeature,
neighbor
MERGE (countyFeature)-[:INFECTS]->(neighbor)
} IN TRANSACTIONS OF 1000 ROWS
All the sessions of the conference are now available online