cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

apoc.refactor.mergeNodes Performance

Hi,

I use the 4.4.8 neo4j version and my database has 10M nodes and 3M relations.

I want to merge nodes with same "Age" property. but below cypher does nothing, no hangs, no crash, no error and obviously no mergers 🙂

props size is 80 which means the subquery runs 80 times

MATCH (n:Person) with distinct n.Age as props
UNWIND props as prop
call{
WITH prop
MATCH (m:Person {Age:prop})
WITH m order by m.Id ASC
with COLLECT(m) AS ns, count(m) as cn where cn > 1
CALL apoc.refactor.mergeNodes(ns, {properties:{`.*`: 'discard'}}) YIELD node RETURN count(*) as s
} RETURN count(s)

Memory config:

dbms.memory.heap.initial_size=8g
dbms.memory.heap.max_size=12g
dbms.memory.pagecache.size=6g

Are there any problems with mergeNodes? Or my cypher is bad behavior ?
Please help me.
Thanks

6 REPLIES 6

Try this:

MATCH (n:Person) 
with n.Age as age, collect(n) as ns, count(*) as cn
where cn > 1
call apoc.refactor.mergeNodes(ns, {properties:{`.*`: 'discard'}}) yield node 
return age, cn

Thanks for your reply, but I want to merge nodes with the same age. in your cypher this does not happen.

It should, since the ‘with’ clause by age will group the nodes with the same age and collect those, so each ‘ns’ collection contains the nodes with the same age.

You are right, thank you for your answer.
But this also doesn't do anything like my code, but I solved this problem by using apoc.periodic.commit. However, it works very slowly. 😞

call apoc.periodic.commit(
"MATCH (n:Person) with n limit $limit
with n.Age as age, collect(n) as ns, count(*) as cn
where cn > 1
call apoc.refactor.mergeNodes(ns, {properties:{`.*`: 'discard'}}) yield node 
return count(*)",{limit:10000})

Periodic commit continually executes the cypher statement until zero rows result, at which point it stops. In your case, the query executes, gets the first 10000 nodes, merges them, and repeats until all the nodes are merge and cn > 1 is no longer true. It’s possible that each batch of 10000 nodes does not contain all the nodes for the ages represented in that batch, so you can end up taking more than one round to merge a specific value of age. You should try using cypher’s ‘call {} in transactions in 10000 rows’ statement instead, if executing in your browser. You will need to add “:auto” at the very beginning of your query.

I also think you just need the call subquery enclosing the ‘write’ part of the query, which is the call to the apoc method.

https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/#subquery-call-in-transactions

In the past few days, I first updated my Neo4j to 5.2, then I tried a variety of queries, including:

apoc.periodic.commit

apoc.periodic.iterate

call {} in transactions

apoc.cypher.parallel

And the combination of these would be with each other.
I tried to reduce the time and volume of each transaction by using the above, but it didn't work.
In all cases, one of the following two things happens
1- The Merge operation is done but slowly
2- The system resources are heavily involved, but over time, nothing happens, that is, it does not give an error, nor does a merge take place.

It seems that the merge operation is not executed in parallel.