Neo4j

lavanya_kannan · ‎02-03-2020

My current code below runs for a long time for 100001 alias nodes:


CALL apoc.periodic.iterate("MATCH (a:alias) RETURN a",
"Match 
path=((a) -- (c1:citation) -[p1]-> (t:BIOTERM) <-[p2]- (c2:citation) -- (b:alias))
WHERE id(a) < id(b) AND id(c1) <> id(c2)
With a, b, p1, p2, 2 as precision
 WITH a, b, p1, p2, 10^precision as factor
Create (a)-[e:through_topic]->(b)
Set e.weight= round(factor* (1/(2+p1.weight+p2.weight))) / factor", {batchSize:1000}) YIELD batches, total, errorMessages

When I ran for a single alias


Match 
path=((a:alias {name: 293} ) -- (c1:citation) -[p1]-> (t:BIOTERM) <-[p2]- (c2:citation) -- (b:alias))
WHERE id(a) < id(b) AND id(c1) <> id(c2)
With a, b, p1, p2, 2 as precision
 WITH a, b, p1, p2, 10^precision as factor
Create (a)-[e:through_topic]->(b)
Set e.weight= round(factor* (1/(2+p1.weight+p2.weight))) / factor

completed in 1 or 2 ms. Should I try to optimize my code or play more with the batchsize of the apoc.periodic.iterate or both? I had no luck decreasing the batchsize.

I ran EXPLAIN and PROFILE with

Thanks,
Lavanya

andrew_bowman · ‎02-03-2020

You may want to rearrange your query somewhat, doing the heavy lifting of the MATCH and calculation in your driving query, and only doing the CREATE in the updating query:

CALL apoc.periodic.iterate("MATCH  path=((a:alias) -- (c1:citation) -[p1]-> (t:BIOTERM) <-[p2]- (c2:citation) -- (b:alias))
WHERE id(a) < id(b) AND id(c1) <> id(c2)
WITH a, b, p1, p2, 2 as precision
 WITH a, b, p1, p2, 10^precision as factor
WITH a, b, round(factor* (1/(2+p1.weight+p2.weight))) / factor as weight
RETURN a, b, weight",
CREATE (a)-[e:through_topic]->(b)
SET e.weight= weight", {batchSize:5000}) YIELD batches, total, errorMessages

As for execution time, if you're seeing around 500k rows being processed for just a single alias, then yes I would expect that this could take a long time.

You may also want to check your memory settings with neo4j-admin memrec.

View solution in original post

andrew_bowman · ‎02-03-2020

You may want to rearrange your query somewhat, doing the heavy lifting of the MATCH and calculation in your driving query, and only doing the CREATE in the updating query:

CALL apoc.periodic.iterate("MATCH  path=((a:alias) -- (c1:citation) -[p1]-> (t:BIOTERM) <-[p2]- (c2:citation) -- (b:alias))
WHERE id(a) < id(b) AND id(c1) <> id(c2)
WITH a, b, p1, p2, 2 as precision
 WITH a, b, p1, p2, 10^precision as factor
WITH a, b, round(factor* (1/(2+p1.weight+p2.weight))) / factor as weight
RETURN a, b, weight",
CREATE (a)-[e:through_topic]->(b)
SET e.weight= weight", {batchSize:5000}) YIELD batches, total, errorMessages

As for execution time, if you're seeing around 500k rows being processed for just a single alias, then yes I would expect that this could take a long time.

You may also want to check your memory settings with neo4j-admin memrec.

Neo4j

Code optimization to decrease run time within apoc.periodic.iterate