Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-13-2020 02:31 PM
Greetings,
I have the following query and my file contains about 30 million records. Is there a way to make this run faster ?
It has been running for well over 40 minutes and still running.
CALL apoc.periodic.iterate('
LOAD CSV WITH HEADERS FROM
"file:///home/pdss01/satish/intl/part-00004-4df038b2-3d1d-47ba-ad35-f391e09d7306-c000.csv" AS line return line.FAREID as fareid, toInteger(line.TARIFF_NBR) as tariff ','
match (f:Fare {ID: fareid})
match (ft:FareTariff {name: tariff})
CREATE (f)-[fft:fare_to_faretariff]->(ft)
',{batchSize:1, iterateList:true, parallel:true})
There are fewer tariff numbers than fares. I am afraid that I might get dead lock errors if I use a bigger batchSize.
Thanks
05-13-2020 06:52 PM
Hi,
I think it would be faster if you created an index before CALL apoc.periodic.iterate.
for 4.x
CREATE INDEX id FOR (n:Fare) ON (n.ID);
CREATE INDEX name FOR (n:FareTariff) ON (n.name);
for 3.x
CREATE INDEX ON :Fare(ID);
CREATE INDEX ON :FareTariff(name);
05-13-2020 07:23 PM
Thanks @koji. I have created constraints on both nodes. Wouldn't that be enough ? I thought constraints created an index.
CREATE CONSTRAINT ON (f:FareTariff) ASSERT f.name IS UNIQUE;
CREATE CONSTRAINT ON (f:FareBasis) ASSERT f.name IS UNIQUE;
Also I verified with call db.constraints(); that my constraints are created properly:
"constraint_9fff29c0" "CONSTRAINT ON ( faretariff:FareTariff ) ASSERT (faretariff.name) IS UNIQUE"
"constraint_f599caff" "CONSTRAINT ON ( fare:Fare ) ASSERT (fare.ID) IS UNIQUE"
is there any other way to check what is causing this to go so slow. ?
Thanks again for your help.
05-14-2020 03:57 PM
It's enough.
CONSTRAINT ON creates these index.
05-14-2020 01:53 AM
Hi Satish,
Avoid to have parallel:true for complex executions
Also why you have mentioned batchsize as 1, it's value should be based on data size you are trying to process at a time. Default value is 10000.
https://neo4j.com/docs/labs/apoc/current/graph-updates/periodic-execution/
Could you please try below
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM
"file:///home/pdss01/satish/intl/part-00004-4df038b2-3d1d-47ba-ad35-f391e09d7306-c000.csv" AS line
match (f:Fare {ID: line.FAREID})
match (ft:FareTariff {name: toInteger(line.TARIFF_NBR)})
CREATE (f)-[fft:fare_to_faretariff]->(ft)
05-14-2020 08:17 PM
For some reason it is very very slow. I am now looking into import tool. Hopefully that works.
All the sessions of the conference are now available online