Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-20-2022 01:35 AM
Hi,
I use the 4.4.8 neo4j version and my database has 10M nodes and 3M relations.
I want to merge nodes with same "Age" property. but below cypher does nothing, no hangs, no crash, no error and obviously no mergers 🙂
props size is 80 which means the subquery runs 80 times
MATCH (n:Person) with distinct n.Age as props
UNWIND props as prop
call{
WITH prop
MATCH (m:Person {Age:prop})
WITH m order by m.Id ASC
with COLLECT(m) AS ns, count(m) as cn where cn > 1
CALL apoc.refactor.mergeNodes(ns, {properties:{`.*`: 'discard'}}) YIELD node RETURN count(*) as s
} RETURN count(s)
Memory config:
dbms.memory.heap.initial_size=8g
dbms.memory.heap.max_size=12g
dbms.memory.pagecache.size=6g
Are there any problems with mergeNodes? Or my cypher is bad behavior ?
Please help me.
Thanks
11-20-2022 05:17 AM
Try this:
MATCH (n:Person)
with n.Age as age, collect(n) as ns, count(*) as cn
where cn > 1
call apoc.refactor.mergeNodes(ns, {properties:{`.*`: 'discard'}}) yield node
return age, cn
11-20-2022 09:34 PM
Thanks for your reply, but I want to merge nodes with the same age. in your cypher this does not happen.
11-21-2022 05:17 AM
It should, since the ‘with’ clause by age will group the nodes with the same age and collect those, so each ‘ns’ collection contains the nodes with the same age.
11-22-2022 10:58 PM
You are right, thank you for your answer.
But this also doesn't do anything like my code, but I solved this problem by using apoc.periodic.commit. However, it works very slowly. 😞
call apoc.periodic.commit(
"MATCH (n:Person) with n limit $limit
with n.Age as age, collect(n) as ns, count(*) as cn
where cn > 1
call apoc.refactor.mergeNodes(ns, {properties:{`.*`: 'discard'}}) yield node
return count(*)",{limit:10000})
11-23-2022 05:04 AM - edited 11-23-2022 05:09 AM
Periodic commit continually executes the cypher statement until zero rows result, at which point it stops. In your case, the query executes, gets the first 10000 nodes, merges them, and repeats until all the nodes are merge and cn > 1 is no longer true. It’s possible that each batch of 10000 nodes does not contain all the nodes for the ages represented in that batch, so you can end up taking more than one round to merge a specific value of age. You should try using cypher’s ‘call {} in transactions in 10000 rows’ statement instead, if executing in your browser. You will need to add “:auto” at the very beginning of your query.
I also think you just need the call subquery enclosing the ‘write’ part of the query, which is the call to the apoc method.
https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/#subquery-call-in-transactions
11-27-2022 09:28 PM - edited 11-28-2022 12:32 AM
In the past few days, I first updated my Neo4j to 5.2, then I tried a variety of queries, including:
apoc.periodic.commit
apoc.periodic.iterate
call {} in transactions
apoc.cypher.parallel
And the combination of these would be with each other.
I tried to reduce the time and volume of each transaction by using the above, but it didn't work.
In all cases, one of the following two things happens
1- The Merge operation is done but slowly
2- The system resources are heavily involved, but over time, nothing happens, that is, it does not give an error, nor does a merge take place.
It seems that the merge operation is not executed in parallel.
All the sessions of the conference are now available online