Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-01-2020 06:38 AM
I am trying to understand some curious behavior in Neo4j that is desirable but perplexing. On occasion, the size of the stored data on disk seems to shrink? In the most recent example, I was uploading more nodes/edges via CSV bulk load. During the process, the number of nodes and edges increased, but the size on disk decreased (as reported in linux by "df -h | grep "var/lib/neo4j/data")
BEFORE: 308 million nodes, 441 million edges, 755 GB on disk
AFTER: 309 million nodes, 453 million edges, 516 GB on disk
Is there some background process that compresses data from time to time? System details are below
Dell optiplex 7010, i7-3770, 24GB ram
Ubuntu Linux 18.04
Neo4j 4.0.0 (though I observed a similar phenomenon with Neo4j 3.5x)
Driver: py2neo for python
OS drive: 250GB SSD
*A 2TB HDD formatted as ext4 is mounted to /var/lib/neo4j/data to hold the large amount of data
Thanks in advance for any insight.
Solved! Go to Solution.
03-01-2020 08:54 AM
It is possible that the transaction logs were cleaned up. What's your retention policy on transaction logs? They are by default retained for a week. When you are doing bulk load, transaction logs can grow large as lot of writes are happening. After week they might get removed as they are beyond retention policy.
03-01-2020 08:54 AM
It is possible that the transaction logs were cleaned up. What's your retention policy on transaction logs? They are by default retained for a week. When you are doing bulk load, transaction logs can grow large as lot of writes are happening. After week they might get removed as they are beyond retention policy.
03-01-2020 01:33 PM
@anthapu - ah- that must be it - thank you! I did a very large upload about a week ago that added ~200 million nodes and ~200 million edges. Checking the configuration file, I see:
:~$ cat /etc/neo4j/neo4j.conf | grep "retention"
dbms.tx_log.rotation.retention_policy=7 days
Thanks again!
All the sessions of the conference are now available online