Neo4j

skmami · ‎05-13-2020

Greetings,

I have the following query and my file contains about 30 million records. Is there a way to make this run faster ?

It has been running for well over 40 minutes and still running.

CALL apoc.periodic.iterate('
LOAD CSV WITH HEADERS FROM
"file:///home/pdss01/satish/intl/part-00004-4df038b2-3d1d-47ba-ad35-f391e09d7306-c000.csv" AS line return line.FAREID as fareid, toInteger(line.TARIFF_NBR) as tariff ','
match (f:Fare {ID: fareid})
match (ft:FareTariff {name: tariff})
CREATE (f)-[fft:fare_to_faretariff]->(ft)
',{batchSize:1, iterateList:true, parallel:true})

There are fewer tariff numbers than fares. I am afraid that I might get dead lock errors if I use a bigger batchSize.

Thanks

koji · ‎05-13-2020

Hi,

I think it would be faster if you created an index before CALL apoc.periodic.iterate.

for 4.x

CREATE INDEX id FOR (n:Fare) ON (n.ID);
CREATE INDEX name FOR (n:FareTariff) ON (n.name);

for 3.x

CREATE INDEX ON :Fare(ID);
CREATE INDEX ON :FareTariff(name);

skmami · ‎05-13-2020

Thanks @koji. I have created constraints on both nodes. Wouldn't that be enough ? I thought constraints created an index.

CREATE CONSTRAINT ON (f:FareTariff) ASSERT f.name IS UNIQUE;
CREATE CONSTRAINT ON (f:FareBasis) ASSERT f.name IS UNIQUE;

Also I verified with call db.constraints(); that my constraints are created properly:

"constraint_9fff29c0"	"CONSTRAINT ON ( faretariff:FareTariff ) ASSERT (faretariff.name) IS UNIQUE"

"constraint_f599caff"	"CONSTRAINT ON ( fare:Fare ) ASSERT (fare.ID) IS UNIQUE"

is there any other way to check what is causing this to go so slow. ?

Thanks again for your help.

koji · ‎05-14-2020

It's enough.
CONSTRAINT ON creates these index.

intouch_vivek · ‎05-14-2020

Hi Satish,

Avoid to have parallel:true for complex executions

Also why you have mentioned batchsize as 1, it's value should be based on data size you are trying to process at a time. Default value is 10000.
https://neo4j.com/docs/labs/apoc/current/graph-updates/periodic-execution/

Could you please try below
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM
"file:///home/pdss01/satish/intl/part-00004-4df038b2-3d1d-47ba-ad35-f391e09d7306-c000.csv" AS line
match (f:Fare {ID: line.FAREID})
match (ft:FareTariff {name: toInteger(line.TARIFF_NBR)})
CREATE (f)-[fft:fare_to_faretariff]->(ft)

skmami · ‎05-14-2020

For some reason it is very very slow. I am now looking into import tool. Hopefully that works.

Neo4j

How to make this load csv go faster?