Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-19-2020 06:46 PM
Hi,
I am using Community edition 3.5 and the following query throws a heap size error. Is there anything I could change in the query before i try to increase the heapsize?
CALL apoc.periodic.iterate(
"MATCH ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
WHERE id(c1) < id(c2)
WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
WITH c1, c2, w1 + w2 as w12, 10^precision as factor
WITH c1, c2, min(w12)/max(w12) as weight , factor
WITH c1, c2, 1 - round(factor * weight) / factor as weight
OPTIONAL MATCH (a:alias) -[:authored]-> (c1)
OPTIONAL MATCH (b:alias) -[:authored]-> (c2)
RETURN a, b, weight, c1.citation as citation1, c2.citation as citation2",
"CREATE (a)-[e:through_topics_WJ]->(b)
SET e.weight= weight,
e.citation1 = citation1
e.citation2 = citation2", {batchSize:5000}) // batch size reduced since there are lot of paths - maybe filter p1.weight+p2.weight > 0.25 or up (ideally > 1 )??
YIELD batches, total, errorMessages
Thanks
Solved! Go to Solution.
02-20-2020 11:44 AM
Yes, you want to avoid having eager operations in the driving (outer) query, so the min() and max() are the problem. Aggregations are eager, meaning that all rows must be manifested in memory for them to take place.
You will need to move the aggregation part of your query (and anything that is based upon those aggregations into the updating query.
As for your later query, that doesn't have aggregations in the driving query, so that shouldn't be an issue. Check node creation to ensure it's executing its batches, it just might have a lot of data to process.
02-19-2020 09:44 PM
Try This - there is a similar thread
02-20-2020 07:07 AM
@Kailash @ganesanmithun323 @andrew.bowman thanks for the suggestions:
I tried the following:
Increase the heap size in neo4j.conf file:
dbms.memory.heap.initial_size=8G
dbms.memory.heap.max_size=8G
dbms.memory.pagecache.size=8G
and ran
CALL apoc.periodic.iterate(
"MATCH ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
WHERE id(c1) < id(c2)
WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
WITH c1, c2, w1 + w2 as w12, 10^precision as factor
WITH c1, c2, min(w12)/max(w12) as weight , factor
WITH c1, c2, 1 - round(factor * weight) / factor as weight
OPTIONAL MATCH (a:alias) -[:authored]-> (c1)
OPTIONAL MATCH (b:alias) -[:authored]-> (c2)
RETURN a, b, weight, c1.citation as citation1, c2.citation as citation2",
"CREATE (a)-[e:through_topics_WJ]->(b)
SET e.weight= weight,
e.citation1 = citation1
e.citation2 = citation2", {batchSize:1000, iterateList:true, parallel:true}) // batch size reduced since there are lot of paths
YIELD batches, total, errorMessages
and still get the heapsize error.
I did not create any constraints on the new relationship that is created since I am using 3.5 community version - not enterprise version. Also I am using APOC 4.0.0.1. Please help.
Thanks,
Lavanya
02-20-2020 07:16 AM
hi , dont use the apoc 4.0.x . It looks there is an issue with it . Try with the older apoc jar.
02-20-2020 08:02 AM
What is the issue with APOC 4.0.x. Some other APOC procedure queries ran fine for me, except this one and the one below (which is running for more than an hour now):
CALL apoc.periodic.iterate(
"MATCH ((c1:citation) -[p1:probability]-> (t:lda_topic) <-[p2:probability]- (c2:citation))
WHERE id(c1) < id(c2) AND (toFloat(p1.prob) + toFloat(p2.prob) > 1)
WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, t, 2 as precision
WITH c1, c2, w1, w2, t, 10^precision as factor
WITH c1, c2, t, round(factor* (1/(2+w1+w2))) / factor as weight
OPTIONAL MATCH (a:alias) -[:authored]-> (c1)
OPTIONAL MATCH (b:alias) -[:authored]-> (c2)
RETURN a, b, weight, t, c1.citation as citation1, c2.citation as citation2",
"CREATE (a)-[e:through_topic]->(b)
SET e.weight= weight,
e.topic = t.entity_type,
e.citation1 = citation1
e.citation2 = citation2", {batchSize:5000})
YIELD batches, total, errorMessages
Let me know if this related to the APOC 4.0.x.
Thanks,
Lavanya
02-20-2020 08:03 AM
I am not very certain about it . But , its mentioned in the thread shared by kailash .
02-20-2020 11:44 AM
Yes, you want to avoid having eager operations in the driving (outer) query, so the min() and max() are the problem. Aggregations are eager, meaning that all rows must be manifested in memory for them to take place.
You will need to move the aggregation part of your query (and anything that is based upon those aggregations into the updating query.
As for your later query, that doesn't have aggregations in the driving query, so that shouldn't be an issue. Check node creation to ensure it's executing its batches, it just might have a lot of data to process.
02-20-2020 12:32 PM
yes, I indeed got my later query complete successfully after sometime.
Regarding my first query, I will move the aggregate functions to the updating query - like below:
CALL apoc.periodic.iterate(
"MATCH ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
WHERE id(c1) < id(c2) < 626630
WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
RETURN c1, c2, w1 + w2 as w12, 10^precision as factor",
"WITH c1, c2, min(w12)/max(w12) as weight , factor
WITH c1, c2, 1 - round(factor * weight) / factor as weight
OPTIONAL MATCH (a:alias) -[:authored]-> (c1)
OPTIONAL MATCH (b:alias) -[:authored]-> (c2)
WITH a, b, weight, c1.citation as citation1, c2.citation as citation2
CREATE (a)-[e:through_topics_WJ]->(b)
SET e.weight= weight,
e.citation1 = citation1,
e.citation2 = citation2", {batchSize:1000, iterateList:true, parallel:true}) // batch size reduced since there are lot of paths - maybe filter p1.weight+p2.weight > 0.25 or up (ideally > 1 )??
YIELD batches, total, errorMessages
and update here. Thanks.
02-20-2020 01:11 PM
Update: Ran within a minute!
All the sessions of the conference are now available online