Neo4j

lavanya_kannan · ‎02-19-2020

Hi,

I am using Community edition 3.5 and the following query throws a heap size error. Is there anything I could change in the query before i try to increase the heapsize?

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) 
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
    WITH c1, c2, w1 + w2 as w12, 10^precision as factor
    WITH c1, c2, min(w12)/max(w12) as weight , factor
	WITH c1, c2, 1 - round(factor * weight) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	RETURN a, b, weight, c1.citation as citation1, c2.citation as citation2",
	"CREATE (a)-[e:through_topics_WJ]->(b)
	SET e.weight= weight, 
	 e.citation1 = citation1
	 e.citation2 = citation2", {batchSize:5000})           // batch size reduced since there are lot of paths - maybe filter  p1.weight+p2.weight > 0.25 or up (ideally > 1 )??
YIELD batches, total, errorMessages

Thanks

andrew_bowman · ‎02-20-2020

Yes, you want to avoid having eager operations in the driving (outer) query, so the min() and max() are the problem. Aggregations are eager, meaning that all rows must be manifested in memory for them to take place.

You will need to move the aggregation part of your query (and anything that is based upon those aggregations into the updating query.

As for your later query, that doesn't have aggregations in the driving query, so that shouldn't be an issue. Check node creation to ensure it's executing its batches, it just might have a lot of data to process.

View solution in original post

Kailash · ‎02-19-2020

Try This - there is a similar thread

lavanya_kannan · ‎02-20-2020

@Kailash @ganesanmithun323 @andrew.bowman thanks for the suggestions:

I tried the following:

Increase the heap size in neo4j.conf file:

dbms.memory.heap.initial_size=8G
dbms.memory.heap.max_size=8G
dbms.memory.pagecache.size=8G

and ran

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) 
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
    WITH c1, c2, w1 + w2 as w12, 10^precision as factor
    WITH c1, c2, min(w12)/max(w12) as weight , factor
	WITH c1, c2, 1 - round(factor * weight) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	RETURN a, b, weight, c1.citation as citation1, c2.citation as citation2",
	"CREATE (a)-[e:through_topics_WJ]->(b)
	SET e.weight= weight, 
	 e.citation1 = citation1
	 e.citation2 = citation2", {batchSize:1000, iterateList:true, parallel:true})           // batch size reduced since there are lot of paths 
YIELD batches, total, errorMessages

and still get the heapsize error.

I did not create any constraints on the new relationship that is created since I am using 3.5 community version - not enterprise version. Also I am using APOC 4.0.0.1. Please help.

Thanks,
Lavanya

ganesanmithun32 · ‎02-20-2020

hi , dont use the apoc 4.0.x . It looks there is an issue with it . Try with the older apoc jar.

lavanya_kannan · ‎02-20-2020

What is the issue with APOC 4.0.x. Some other APOC procedure queries ran fine for me, except this one and the one below (which is running for more than an hour now):

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (t:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) AND (toFloat(p1.prob) + toFloat(p2.prob) > 1)
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, t, 2 as precision
	WITH c1, c2, w1, w2, t, 10^precision as factor
	WITH c1, c2, t, round(factor* (1/(2+w1+w2))) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	RETURN a, b, weight, t, c1.citation as citation1, c2.citation as citation2",
	"CREATE (a)-[e:through_topic]->(b)
	SET e.weight= weight, 
	 e.topic = t.entity_type,
	 e.citation1 = citation1
	 e.citation2 = citation2", {batchSize:5000})          
YIELD batches, total, errorMessages

Let me know if this related to the APOC 4.0.x.

Thanks,
Lavanya

ganesanmithun32 · ‎02-20-2020

I am not very certain about it . But , its mentioned in the thread shared by kailash .

andrew_bowman · ‎02-20-2020

Yes, you want to avoid having eager operations in the driving (outer) query, so the min() and max() are the problem. Aggregations are eager, meaning that all rows must be manifested in memory for them to take place.

You will need to move the aggregation part of your query (and anything that is based upon those aggregations into the updating query.

As for your later query, that doesn't have aggregations in the driving query, so that shouldn't be an issue. Check node creation to ensure it's executing its batches, it just might have a lot of data to process.

lavanya_kannan · ‎02-20-2020

@andrew.bowman

yes, I indeed got my later query complete successfully after sometime.

Regarding my first query, I will move the aggregate functions to the updating query - like below:

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) < 626630
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
    RETURN c1, c2, w1 + w2 as w12, 10^precision as factor",
    "WITH c1, c2, min(w12)/max(w12) as weight , factor
	WITH c1, c2, 1 - round(factor * weight) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	WITH a, b, weight, c1.citation as citation1, c2.citation as citation2
	CREATE (a)-[e:through_topics_WJ]->(b)
	SET e.weight= weight, 
	 e.citation1 = citation1,
	 e.citation2 = citation2", {batchSize:1000, iterateList:true, parallel:true})           // batch size reduced since there are lot of paths - maybe filter  p1.weight+p2.weight > 0.25 or up (ideally > 1 )??
YIELD batches, total, errorMessages

and update here. Thanks.

lavanya_kannan · ‎02-20-2020

Update: Ran within a minute!

Neo4j

Running out of heap memory size