Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-19-2022 06:54 PM - edited 12-19-2022 11:35 PM
The following is the cypher that I used to merge the duplicate node
###################
MATCH (n:User)
WITH n.user AS repeatuser, collect(n) AS nodes
WHERE size(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes)
YIELD node
RETURN node
######################
Question : How can I run the above query faster ? I tried the following
Thanks.
Solved! Go to Solution.
12-20-2022 11:07 PM
How do you plan on running this?
‘Call {} in transactions’ only works with implied transactions. This requires prepending ‘:auto’ when executing in the browser.
:auto
MATCH (n:Process)
WITH n.pid AS repeatpid, collect(n) AS nodes
WHERE size(nodes) > 1
CALL{
WITH nodes
CALL apoc.refactor.mergeNodes(nodes,{properties:'combine'})
YIELD node
RETURN 5
} in transactions of 10000 rows
Can you remove the ‘return’ or both the ‘yield and ‘return’, or does it complain neither is allowed?
https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/#_batching
how do you have so many duplicates?
12-20-2022 08:20 AM - edited 12-20-2022 08:22 AM
Do you need to return the whole node or anything for that matter? If not, try removing the return statement. If it complains you can’t end with a call without returning anything, return a constant or a limited number of node properties.
You could wrap the apoc procedure in a ‘call subquery in transaction’ clause, importing ‘nodes’ using ‘with’. This would batch the updates.
In your implementation using ‘apoc.periodic.iterate’, you are matching twice to get the same nodes. I would suggest the first query create the collections and return them. The second query calls the apoc method for each collection of nodes created in the first query. This would be similar to using ‘call subquery’.
12-20-2022 07:24 PM - edited 12-20-2022 07:24 PM
Thanks, the "call subquery" and "remove return" work. But for the last (your suggestion), I tried the following
12-20-2022 07:51 PM - edited 12-20-2022 07:53 PM
You should not need the call subquery. I suggested using ‘call subquery with transactions’ as an alternative to apoc.periodic.iterate.
I assume the nodes you are merging have relationships, which will be merged too. As such, you may get record locking contention. Try not running it parallel. Also, try increasing the batch sized. You could try 10,000. Decrease if you experience memory issues.
12-20-2022 10:45 PM - edited 12-20-2022 10:46 PM
Excuse me, now I have 483,000 node (all named as "process") with 2,950,000 relation (all named as "fork"), I tried the following
(A) : Call subquery with transaction
CALL{
MATCH (n:Process)
WITH n.pid AS repeatpid, collect(n) AS nodes
WHERE size(nodes) > 1
CALL{
WITH nodes
CALL apoc.refactor.mergeNodes(nodes,{properties:'combine'})
YIELD node
RETURN 5
}
}
(B) : apoc.periodic (no parallel)
12-20-2022 11:07 PM
How do you plan on running this?
‘Call {} in transactions’ only works with implied transactions. This requires prepending ‘:auto’ when executing in the browser.
:auto
MATCH (n:Process)
WITH n.pid AS repeatpid, collect(n) AS nodes
WHERE size(nodes) > 1
CALL{
WITH nodes
CALL apoc.refactor.mergeNodes(nodes,{properties:'combine'})
YIELD node
RETURN 5
} in transactions of 10000 rows
Can you remove the ‘return’ or both the ‘yield and ‘return’, or does it complain neither is allowed?
https://neo4j.com/docs/cypher-manual/current/clauses/call-subquery/#_batching
how do you have so many duplicates?
12-21-2022 12:44 AM - edited 12-21-2022 12:48 AM
I tried what you showed, i.e.
All the sessions of the conference are now available online