Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-10-2022 01:05 PM
I'm using Neo4J enterprise (v4,4,2) on an AWS EC2 instance running CentOS 7 and having trouble cleaning up a database after a runaway query added about 8M excess labeled nodes.
I'm running a "DETACH DELETE" operation that is removing about 8M nodes in small batches (10,000). I use this small batch size to avoid running out memory.
Although the query appears to be behaving as desired, Neo4J is filling the disk with large transaction log files every minute or so.
Here is an excerpt from /var/log/neo4j/debug.log
:
2022-05-10 19:32:35.273+0000 INFO [o.n.k.d.Database] [covid-b/2ffabc51] Rotated to transaction log [/var/lib/neo4j/data/transactions/covid-b/neostore.transaction.db.131] version=130, last transaction in previous log=523146, rotation took 48 millis, started after 71159 millis.
2022-05-10 19:33:09.861+0000 INFO [o.n.k.d.Database] [covid-b/2ffabc51] Rotated to transaction log [/var/lib/neo4j/data/transactions/covid-b/neostore.transaction.db.132] version=131, last transaction in previous log=523176, rotation took 86 millis, started after 34502 millis.
2022-05-10 19:33:44.477+0000 INFO [o.n.k.d.Database] [covid-b/2ffabc51] Rotated to transaction log [/var/lib/neo4j/data/transactions/covid-b/neostore.transaction.db.133] version=132, last transaction in previous log=523206, rotation took 45 millis, started after 34571 millis.
2022-05-10 19:34:16.330+0000 INFO [o.n.k.d.Database] [covid-b/2ffabc51] Rotated to transaction log [/var/lib/neo4j/data/transactions/covid-b/neostore.transaction.db.134] version=133, last transaction in previous log=523236, rotation took 44 millis, started after 31809 millis.
2022-05-10 19:34:49.175+0000 INFO [o.n.k.d.Database] [covid-b/2ffabc51] Rotated to transaction log [/var/lib/neo4j/data/transactions/covid-b/neostore.transaction.db.135] version=134, last transaction in previous log=523266, rotation took 47 millis, started after 32798 millis.
The resulting files in the transactions
subdirectory are many and large:
ls -l /var/lib/neo4j/data/transactions/covid-b
total 6341444
-rw-r--r-- 1 root root 176896 May 10 15:36 checkpoint.0
-rw-r--r-- 1 root root 281823579 May 10 15:20 neostore.transaction.db.120
-rw-r--r-- 1 root root 300188833 May 10 15:22 neostore.transaction.db.121
-rw-r--r-- 1 root root 300139578 May 10 15:23 neostore.transaction.db.122
-rw-r--r-- 1 root root 300247329 May 10 15:24 neostore.transaction.db.123
-rw-r--r-- 1 root root 300140025 May 10 15:25 neostore.transaction.db.124
-rw-r--r-- 1 root root 266161319 May 10 15:26 neostore.transaction.db.125
-rw-r--r-- 1 root root 300331081 May 10 15:28 neostore.transaction.db.126
-rw-r--r-- 1 root root 300511449 May 10 15:29 neostore.transaction.db.127
-rw-r--r-- 1 root root 299871860 May 10 15:30 neostore.transaction.db.128
-rw-r--r-- 1 root root 300844707 May 10 15:31 neostore.transaction.db.129
-rw-r--r-- 1 root root 263526666 May 10 15:32 neostore.transaction.db.130
-rw-r--r-- 1 root root 270276758 May 10 15:33 neostore.transaction.db.131
-rw-r--r-- 1 root root 270243754 May 10 15:33 neostore.transaction.db.132
-rw-r--r-- 1 root root 269995516 May 10 15:34 neostore.transaction.db.133
-rw-r--r-- 1 root root 270051214 May 10 15:34 neostore.transaction.db.134
-rw-r--r-- 1 root root 270246701 May 10 15:35 neostore.transaction.db.135
-rw-r--r-- 1 root root 270053812 May 10 15:35 neostore.transaction.db.136
-rw-r--r-- 1 root root 262469905 May 10 15:36 neostore.transaction.db.137
-rw-r--r-- 1 root root 262577377 May 10 15:37 neostore.transaction.db.138
-rw-r--r-- 1 root root 270839820 May 10 15:37 neostore.transaction.db.139
-rw-r--r-- 1 root root 270257299 May 10 15:38 neostore.transaction.db.140
-rw-r--r-- 1 root root 262144000 May 10 15:38 neostore.transaction.db.141
According to du -h
, it filled this with more than 6G of logs in just 18 minutes.
What am I doing wrong and what should I do differently?
05-10-2022 02:58 PM
DETACH DELETE is also going to remove relationships. Do you have dense nodes, i.e. some nodes which have for example 50k relationships and to which it may be viewed as just deleteing 1 node but its really deleting 1 node and 50k relationships
You can also influences the txn retention via dbms.tx_log.rotation.retention_policy Configuration settings - Operations Manual
and this can be set dynamically via call dbms.setconfigValue()
see Dynamic settings - Operations Manual
05-10-2022 03:15 PM
I don't think I have any "dense" nodes as you describe them.
Each deleted node (Datapoint
) has a single labeled :DATASET
relationship to an instance of another labeled node (Dataset
). There are typically about 3K Datapoint
instances bound to each Dataset
, although one anomalous Dataset
has many more than that.
I don't know about and have not attempted to configure any transaction-related configuration.
I'll read more about "Dynamic settings". I'm attempting to do a one-time patch of two databases.
All the sessions of the conference are now available online