Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-31-2022 02:42 PM
Hello everyone,
I am having some problems with large-scale deletions in my database. Referring to this article: Large Delete Transaction Best Practices in Neo4j - Knowledge Base I decided to use the following code to perform large-scale deletions:
call apoc.periodic.iterate("MATCH (n: NodeTypeToBeDeleted) return id(n) as id", "MATCH (n) WHERE id(n) = id DETACH DELETE n", {batchSize:10000})
yield batches, total return batches, total
Failed to apply transaction: Transaction #1466 at log position LogPosition{logVersion=57, byteOffset=169956717} {started 2022-01-31 21:57:53.495+0000, committed 2022-01-31 21:57:54.181+0000, with 138588 commands in this transaction, lease -1, latest committed transaction id when started was 1465, additional header bytes: }
01-31-2022 08:01 PM
Usually for this kind of outcome, it means the combination of the nodes you're deleting per batch, plus the number of attached relationships to those nodes that also have to be deleted, are more than the heap can handle, which can lead to out of memory events that can result in a quarantine. Supernodes, that may have a large number of relationships per node, are usually culprits here.
A good way to handle this is to change your MATCH to relationships attached to your to-be-deleted nodes, and delete those relationships in batches first, so only 10k or so at a time will be deleted.
The inner and outer queries to use would be:
"MATCH (n: NodeTypeToBeDeleted)-[r]-() return id(r) as id"
and
"MATCH ()-[r]->() WHERE id(r) = id DELETE r"
Once the relationships are deleted, you can go ahead with the deletion of the nodes themselves using the original batch delete query you were attempting.
All the sessions of the conference are now available online