cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Running out of heap memory size

Hi,

I am using Community edition 3.5 and the following query throws a heap size error. Is there anything I could change in the query before i try to increase the heapsize?

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) 
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
    WITH c1, c2, w1 + w2 as w12, 10^precision as factor
    WITH c1, c2, min(w12)/max(w12) as weight , factor
	WITH c1, c2, 1 - round(factor * weight) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	RETURN a, b, weight, c1.citation as citation1, c2.citation as citation2",
	"CREATE (a)-[e:through_topics_WJ]->(b)
	SET e.weight= weight, 
	 e.citation1 = citation1
	 e.citation2 = citation2", {batchSize:5000})           // batch size reduced since there are lot of paths - maybe filter  p1.weight+p2.weight > 0.25 or up (ideally > 1 )??
YIELD batches, total, errorMessages

Thanks

1 ACCEPTED SOLUTION

Yes, you want to avoid having eager operations in the driving (outer) query, so the min() and max() are the problem. Aggregations are eager, meaning that all rows must be manifested in memory for them to take place.

You will need to move the aggregation part of your query (and anything that is based upon those aggregations into the updating query.

As for your later query, that doesn't have aggregations in the driving query, so that shouldn't be an issue. Check node creation to ensure it's executing its batches, it just might have a lot of data to process.

View solution in original post

8 REPLIES 8

Kailash
Graph Buddy

Try This - there is a similar thread

@Kailash @ganesanmithun323 @andrew.bowman thanks for the suggestions:

I tried the following:

Increase the heap size in neo4j.conf file:

dbms.memory.heap.initial_size=8G
dbms.memory.heap.max_size=8G
dbms.memory.pagecache.size=8G

and ran

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) 
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
    WITH c1, c2, w1 + w2 as w12, 10^precision as factor
    WITH c1, c2, min(w12)/max(w12) as weight , factor
	WITH c1, c2, 1 - round(factor * weight) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	RETURN a, b, weight, c1.citation as citation1, c2.citation as citation2",
	"CREATE (a)-[e:through_topics_WJ]->(b)
	SET e.weight= weight, 
	 e.citation1 = citation1
	 e.citation2 = citation2", {batchSize:1000, iterateList:true, parallel:true})           // batch size reduced since there are lot of paths 
YIELD batches, total, errorMessages

and still get the heapsize error.

I did not create any constraints on the new relationship that is created since I am using 3.5 community version - not enterprise version. Also I am using APOC 4.0.0.1. Please help.

Thanks,
Lavanya

hi , dont use the apoc 4.0.x . It looks there is an issue with it . Try with the older apoc jar.

What is the issue with APOC 4.0.x. Some other APOC procedure queries ran fine for me, except this one and the one below (which is running for more than an hour now):

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (t:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) AND (toFloat(p1.prob) + toFloat(p2.prob) > 1)
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, t, 2 as precision
	WITH c1, c2, w1, w2, t, 10^precision as factor
	WITH c1, c2, t, round(factor* (1/(2+w1+w2))) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	RETURN a, b, weight, t, c1.citation as citation1, c2.citation as citation2",
	"CREATE (a)-[e:through_topic]->(b)
	SET e.weight= weight, 
	 e.topic = t.entity_type,
	 e.citation1 = citation1
	 e.citation2 = citation2", {batchSize:5000})          
YIELD batches, total, errorMessages

Let me know if this related to the APOC 4.0.x.

Thanks,
Lavanya

I am not very certain about it . But , its mentioned in the thread shared by kailash .

Yes, you want to avoid having eager operations in the driving (outer) query, so the min() and max() are the problem. Aggregations are eager, meaning that all rows must be manifested in memory for them to take place.

You will need to move the aggregation part of your query (and anything that is based upon those aggregations into the updating query.

As for your later query, that doesn't have aggregations in the driving query, so that shouldn't be an issue. Check node creation to ensure it's executing its batches, it just might have a lot of data to process.

@andrew.bowman

yes, I indeed got my later query complete successfully after sometime.

Regarding my first query, I will move the aggregate functions to the updating query - like below:

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) < 626630
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
    RETURN c1, c2, w1 + w2 as w12, 10^precision as factor",
    "WITH c1, c2, min(w12)/max(w12) as weight , factor
	WITH c1, c2, 1 - round(factor * weight) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	WITH a, b, weight, c1.citation as citation1, c2.citation as citation2
	CREATE (a)-[e:through_topics_WJ]->(b)
	SET e.weight= weight, 
	 e.citation1 = citation1,
	 e.citation2 = citation2", {batchSize:1000, iterateList:true, parallel:true})           // batch size reduced since there are lot of paths - maybe filter  p1.weight+p2.weight > 0.25 or up (ideally > 1 )??
YIELD batches, total, errorMessages

and update here. Thanks.

Update: Ran within a minute!