Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-20-2020 05:06 AM
20/02/2020 02:20
In this post, I’ll walk through how you can take a backup of an existing database and use it to seed an instance of Neo4j inside a Docker container. This could be useful if you are looking to fire up a development server using real data. I’ll show you how how to launch an instance of Neo4j using docker-compose
and then extend the official Docker image by creating a custom Dockerfile
.
You can find Neo4j images going back to 3.4 on Docker hub all named as x.y.z
for community and with -enterprise
appended for enterprise edition. You can get up and running quickly by running the docker run
command.
In order to run Enterprise Edition, you need to accept the Neo4j Licensing Agreement. You do this with Docker by setting the NEO4J_ACCEPT_LICENSE_AGREEMENT
variable to yes
.
docker run --env=NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
neo4j:4.0.0-enterprise
neo4j-admin backup
The neo4j-admin
tool located in the $NEO4J_HOME/bin
folder allows you to run a number of administation commands including backup, restore and an import tool for imports of over 1M rows.
There are two types of exports - dump
and export
. The dump
command creates an archive that can be easily shared and is great for smaller databases. For larger databases, the export
command allows you to do an incremental backup. If you run a backup on a directory that already has a backup in there, it will take the difference and append it to the store files rather than starting from transaction id 0. This is great for larger databases.
The backup is enabled by default, but by default it will only listen on the backup port on requests from localhost. Because we’ll be taking a backup from a local machine, we’ll need to enable remote backups by setting dbms.backup.listen_address
to 0.0.0.0:6362
.
neo4j.conf
# Enable online backups to be taken from this database.
dbms.backup.enabled=true
# By default the backup service will only listen on localhost.
# To enable remote backups you will have to bind to an external
# network interface (e.g. 0.0.0.0 for all interfaces).
# The protocol running varies depending on deployment. In a Causal Clustering environment this is the
# same protocol that runs on causal_clustering.transaction_listen_address.
dbms.backup.listen_address=0.0.0.0:6362
To run the backup you can run the following command:
bin/neo4j-admin backup \
--from=neo4jurl:7687 \ # hostname or ip and port for neo4j server
--backup-dir=/path/to/backups \ # directory to store the backup in
--database=neo4j # The name of the database to backup
A consistency check runs as part of the backup to make sure that the backup files are OK, but this can take a while on a large database. You can disable this by adding --check-consistency=false
and check the consitency at a later time.
One of the nice things about Docker is that you can build or extend Dockerfile
s to create an image. These can also be published to the Docker Hub but I won’t cover that here. The FROM
keyword allows you to choose an image to build on top of, in this case we want the latest version of Neo4j Enterprise.
Dockerfile
The neo4j images all automatically start up the neo4j instance. In this case, we want to run the backup on the production server and restore it before the neo4j server starts. We can do this by replacing the the ENTRYPOINT
with one of our own. There’s a lot of complicated stuff going on in in the docker-entrypoint.sh file that I don’t really want to be replicating and maintaining, so instead we can just create a new shell file which performs the backup and restore before calling the original docker-entrypoint.sh
file.
my-entrypoint.sh
#!/bin/bash
echo "Running Backup & Restore"
neo4j-admin backup --from=$PRODUCTION --backup-dir=/backup
neo4j-admin restore --from=/backup/neo4j --database=neo4j --force
The script runs the neo4j-admin backup
command and places the backup in the /backup
directory before restore
ing it into the default neo4j
database. The introduction of the $PRODUCTION
environment variable to the call means that the address of the neo4j server can be set as an --env
flag when the container is created. The --force
command will overwrite any files if they already exist, perfect for if we’re mounting a volume for the data.
The file ownership caused me a few issues when developing this script, the neo4j process is run by a user called neo4j
whereas this entrypoint script is ran by root. Originally, this caused a Neo.TransientError.Database.DatabaseUnavailable
error complaining that Database 'neo4j' is unavailable
. This was because the neo4j user couldn’t write to the directory. chown
ing the /data directory to neo4j:neo4j
fixes this issue.
my-entrypoint.sh
chown -R neo4j:neo4j /data
After that, the original docker-entrypoint.sh
script can be run to work it’s magic and bring the database up.
my-entrypoint.sh
/docker-entrypoint.sh neo4j
Dockerfile
Back in Dockerfile
, a few commands are needed to clean things up. Firstly, we’ll need to accept the license agreement.
Dockerfile
ENV NEO4J_ACCEPT_LICENSE_AGREEMENT yes
Next, setting the dbms.directories.data
directory to a folder in the root will make it easier to mount a volume.
Dockerfile
ENV NEO4J_dbms_directories_data /data
Then, my-entrypoint.sh
needs to be copied to the docker container. By default the file will not have execute permissions, so the RUN
command will allow us to run chmod
to add execution permission on the file.
Dockerfile
WORKDIR /
COPY my-entrypoint.sh /my-entrypoint.sh
RUN chmod +x /my-entrypoint.sh
Finally, we can overwrite the ENTRYPOINT
to run my-entrypoint.sh
(and subsequently the original docker-entrypoint.sh
) before running the neo4j
command to start neo4j.
Dockerfile
ENTRYPOINT ["/sbin/tini", "-g", "--", "/my-entrypoint.sh"]
CMD ["neo4j"]
The docker build
command creates an image that can be used when creating containers.
To make life easier, I have tagged the new image as dev
using the -t dev
flag, otherwise it would generate a random hash and the whole thing couldn’t be automated.
Containers with the newly created dev
image can be created using the docker run
command. I have mapped the HTTP and Bolt ports using -p
so I can access the Neo4j Browser and query the data via bolt. As mentioned before, running a backup on a directory with an existing backup will trigger an incremental backup so I will mount the backup directory as a volume on the docker container. The same goes for the data directory. The local path to the volumes need to be absolute so I have created a $HERE
environment variable to make things a bit easier.
docker run --name=dev \
-p 17474:7474 \ # Map HTTP port from container to 17474
-p 17687:7687 \ # Map Bolt port from container to 17687
--env="PRODUCTION=prod.databases.adamcowley.co.uk:6463" # Env var for server
--volume="$HERE/backup:/backup" \ # Mount backup directory volume to /backup
--volume="$HERE/data:/data" \ # Mount data directory volume to /data
dev # Use the newly built dev image
Being fairly inexperienced with Docker, this took me a while to figure out. But once I realised that I can just extend an existing image, my life became a lot easier. This process works well for a single instance, but could also be used to automate the seeding and deployment of Read Replicas. Downloading a copy of a previous backup and mounting it as a volume will speed up the startup process on larger databases.
I’ve put the code up on Github - feel free to pull, clone or submit a PR.
All the sessions of the conference are now available online