cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Code optimization to decrease run time within apoc.periodic.iterate

My current code below runs for a long time for 100001 alias nodes:


CALL apoc.periodic.iterate("MATCH (a:alias) RETURN a",
"Match 
path=((a) -- (c1:citation) -[p1]-> (t:BIOTERM) <-[p2]- (c2:citation) -- (b:alias))
WHERE id(a) < id(b) AND id(c1) <> id(c2)
With a, b, p1, p2, 2 as precision
 WITH a, b, p1, p2, 10^precision as factor
Create (a)-[e:through_topic]->(b)
Set e.weight= round(factor* (1/(2+p1.weight+p2.weight))) / factor", {batchSize:1000}) YIELD batches, total, errorMessages

When I ran for a single alias


Match 
path=((a:alias {name: 293} ) -- (c1:citation) -[p1]-> (t:BIOTERM) <-[p2]- (c2:citation) -- (b:alias))
WHERE id(a) < id(b) AND id(c1) <> id(c2)
With a, b, p1, p2, 2 as precision
 WITH a, b, p1, p2, 10^precision as factor
Create (a)-[e:through_topic]->(b)
Set e.weight= round(factor* (1/(2+p1.weight+p2.weight))) / factor

completed in 1 or 2 ms. Should I try to optimize my code or play more with the batchsize of the apoc.periodic.iterate or both? I had no luck decreasing the batchsize.

I ran EXPLAIN and PROFILE with

Thanks,
Lavanya

1 ACCEPTED SOLUTION

You may want to rearrange your query somewhat, doing the heavy lifting of the MATCH and calculation in your driving query, and only doing the CREATE in the updating query:

CALL apoc.periodic.iterate("MATCH  path=((a:alias) -- (c1:citation) -[p1]-> (t:BIOTERM) <-[p2]- (c2:citation) -- (b:alias))
WHERE id(a) < id(b) AND id(c1) <> id(c2)
WITH a, b, p1, p2, 2 as precision
 WITH a, b, p1, p2, 10^precision as factor
WITH a, b, round(factor* (1/(2+p1.weight+p2.weight))) / factor as weight
RETURN a, b, weight",
CREATE (a)-[e:through_topic]->(b)
SET e.weight= weight", {batchSize:5000}) YIELD batches, total, errorMessages

As for execution time, if you're seeing around 500k rows being processed for just a single alias, then yes I would expect that this could take a long time.

You may also want to check your memory settings with neo4j-admin memrec.

View solution in original post

1 REPLY 1

You may want to rearrange your query somewhat, doing the heavy lifting of the MATCH and calculation in your driving query, and only doing the CREATE in the updating query:

CALL apoc.periodic.iterate("MATCH  path=((a:alias) -- (c1:citation) -[p1]-> (t:BIOTERM) <-[p2]- (c2:citation) -- (b:alias))
WHERE id(a) < id(b) AND id(c1) <> id(c2)
WITH a, b, p1, p2, 2 as precision
 WITH a, b, p1, p2, 10^precision as factor
WITH a, b, round(factor* (1/(2+p1.weight+p2.weight))) / factor as weight
RETURN a, b, weight",
CREATE (a)-[e:through_topic]->(b)
SET e.weight= weight", {batchSize:5000}) YIELD batches, total, errorMessages

As for execution time, if you're seeing around 500k rows being processed for just a single alias, then yes I would expect that this could take a long time.

You may also want to check your memory settings with neo4j-admin memrec.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online