Neo4j

pavloN · ‎11-15-2019

I received a large file. These two queries work badly.
Now, these queries are needed to be optimized.
Separating queries is a good idea. But, if I split up them, I will go through this big file several times.
What is better to do: separate queries or to make better them?
Might, somebody has ideas about how to optimize them)

CALL apoc.periodic.iterate('WITH apoc.convert.fromJsonList(data) as arr UNWIND arr as v RETURN v' ,' FOREACH ( i in CASE WHEN v.dog=false THEN [1] ELSE [] END | MERGE (c:Cat{id:v.id, version: "{version}"}))
FOREACH ( i in CASE WHEN v.dog=true THEN [1] ELSE END | MERGE (c:Dog{id:v.id, version: "${version}"}))
WITH v
MATCH (c{id:v.id, version:"${version}"})
UNWIND RANGE(0,CASE WHEN length(v.weightOfAllCat)>length(v.weightOfAllDog)THEN length(v.weightOfAllCat) ELSE 	 length(v.weightOfAllDog) END) as i
MERGE (p:Prod {ean: v.name, version: "${version}"})
MERGE (a:Pro {ean: v.name, version: "${version}"})

WITH v, c, p, a
CALL apoc.do.when(v.dog=false,  "MERGE (c)-[:PRI]->(p)  MERGE (c)-[:ALTER]->(a)",
"MERGE (c)-[:PRI_A]->(p) MERGE (c)-[:ALTER_A]->(a)",
{v:v, c:c, p:p, a:a}) YIELD value
RETURN value
',
{ batchSize: 5000, iterateList: true, parallel:true, params:{data:'${data}'}})

UNWIND split("{prod}", ",") as prod_id MATCH (p:Prod{id:prod_id, version: "{version}"})<-[a:ALTER]-(c:Cat{version: "{version}"}) WITH max(toInteger(apoc.text.replace(c.id,'[A-Za-z+]', ""))) as max, p MATCH (c:Cat{version: "{version}"})-[:ALTER]->(p)
WHERE toInteger(apoc.text.replace(c.id,'[A-Za-z+]', "")) =max
MATCH (c:Cat{version: "${version}"})-[d:ALTER]->(p)
MERGE (c)-[:PRIM]->(p)
MERGE (c)-[:ALTER_P]->(p)
DETACH DELETE d
RETURN collect(DISTINCT(p.prod_id)) as proc

Thomas_Silkjaer · ‎11-26-2019

First of all, you are running the apoc.iterate in parallel mode while also adding relationships. When creating a relationship, locks are made on both connected nodes, and you risk a deadlock situation (unless you a sure that no relationships are made to the same nodes in the entire set).

MATCH (c{id:v.id, version:"${version}"})does not specify label, an index would help.

You are also matching and merging nodes with multiple properties, e.g. MERGE (p:Prod {ean: v.name, version: "${version}"}) – are these indexed as a composite indexes?

Is this imported to an existing database? Otherwise preprocessing the content to CSV files and using neo4j-admin import is likely the fastest approach (depending on the size of the dataset).

View solution in original post

Thomas_Silkjaer · ‎11-26-2019

First of all, you are running the apoc.iterate in parallel mode while also adding relationships. When creating a relationship, locks are made on both connected nodes, and you risk a deadlock situation (unless you a sure that no relationships are made to the same nodes in the entire set).

MATCH (c{id:v.id, version:"${version}"})does not specify label, an index would help.

You are also matching and merging nodes with multiple properties, e.g. MERGE (p:Prod {ean: v.name, version: "${version}"}) – are these indexed as a composite indexes?

Is this imported to an existing database? Otherwise preprocessing the content to CSV files and using neo4j-admin import is likely the fastest approach (depending on the size of the dataset).

pavloN · ‎11-26-2019

@Thomas_Silkjaer Thank you!

Neo4j

What is worse read several times a big object (about 70000 rows) or optimize these difficult queries? node js