Neo4j

tharshy · ‎08-24-2018

I would like to know, what is the exact procedure to copy a graph.db file on all three instances on my cluster? I have the file locally stored in the all three vms under /home, now I need to move them to /var/lib/neo4j/data/database? thank you

david_allen · ‎08-24-2018

graph.db is typically a directory, not an individual file.

create a zip archive of a graph.db folder
copy it to all three machines

Do the following for each machine, so in other words, each step 3 times:
3. ensure the system service is not running on those machines
4. unzip the same graph.db.zip into the appropriate database folder
5. run neo4j-admin unbind to remove existing cluster state
6. restart the neo4jj system service

tharshy · ‎08-24-2018

Thank you David, will try that out.

klobuczek · ‎09-19-2018

For us following the above procedure with a cluster backup leads to a following error when starting a 3 server cluster.

2018-09-19 04:57:05.295+0000 ERROR [o.n.b.v.r.ErrorReporter] Client triggered an unexpected error [Neo.DatabaseError.General.UnknownError]: IdAllocation state is probably corrupted or out of sync with the cluster. Local highId is 47022138 and allocation range is IdRange[46834980-46836003, defrag []], reference 0296d1b1-d9a2-4d8c-890c-54ad617d6c92.

When the backup directory is first started in SINGLE mode and shutdown and then distributed to the 2 other servers the error does not happen. There is no store upgrade involved or necessary.
Ideas?

david_allen · ‎09-19-2018

We must distinguish between a zipped graph.db folder and a backup, they're not the same thing at all.

A zipped graph db folder (which is being discussed in this thread) can be unzipped and put into place. This is something that users do, but frankly isn't the recommended way.

The recommended way is to use the backup tool and the restore tool (neo4j-admin restore). But the backup tool creates a backup catalog which is not the same as a graph.db folder. As a result you can't use the method described in this thread to restore a backup set. For that, check the docs on neo4j-admin restore.

klobuczek · ‎09-19-2018

Thanks a lot. Looks like this has changed recently. The neo4j-admin restore is horribly inefficient, that's why we never used it. It requires remote copy to all cluster members using OS tools. Then the command itself does yet another local copy (unless this has improved recently). For a large database this takes long time.
It is unfortunate that neo4j stopped supporting regular copy in favor of neo4j-admin restore without improving its usability.

tharshy · ‎09-20-2018

In the below step for manually copying the graph.db, if I do step5, then my clusters won't form again after unbind. However when I skip the unbind process, and just restart the clusters after copying the db, then my new graph.db is in place and clusters able to form. How important to do unbind in this procedure?, when we can still get the new db to work without unbinding. Thank you

1.create zip archive of a graph.db folder
2.copy it to all three machines
Do the following for each machine, so in other words, each step 3 times:
3. ensure the system service is not running on those machines
4. unzip the same graph.db.zip into the appropriate database folder
5. run neo4j-admin unbind to remove existing cluster state
6. restart the neo4jj system service

andrew_bowman · ‎09-21-2018

If you do end up doing the file copy (not recommended for reasons given above), neo4j-admin unbind is necessary so that cluster state previously held by that node doesn't get out of sync with the newly restored db. In general whenever you restore a db on a node, you should be unbinding its cluster state.

Also, when restoring in this manner (file copy) it's important to delete the .id files, as they hold information such as the free lists that can be used for reusing memory in your store files. Since these free lists aren't synced in cluster operations, if you've restored via file copy to multiple nodes, they could be using the same free lists, which is a time bomb for corruption (node 1 adds data, and reuses previously-vacated spaces in stores from the free list...later on node 2 becomes leader, adds data, and uses those same free lists, overwriting data that is now live and not free = data corruption).

When performing neo4j-admin backup it should be taking care of the .id free lists for you so they shouldn't be a problem.

tharshy · ‎09-21-2018

thank you, however I have tried multiple times to restore the backup files..but for some reason after following exactly what it says in neo4j documentation, still my clsuters won't form at all. This is a big road block for my product release now, I have been trying to this for past few weeks troubleshooting.

tharshy · ‎09-21-2018

@andrew.bowman
can you tell me if the below process is correct

Backup the causal clusters while they are running:
neo4j-admin backup --protocol=catchup --from:127.0.0.1:6362 --backup-dir=/mnt/backup --name=graph.db.1 --pagecache=2G

above process successfully backup the files inside /mnt/backup

Restore process:
shutdown all three core instances: sudo systemctl stop neo4j
unbind all three instances: sudo neo4j-admin unbind
restore graph.db: sudo neo4j-admin restore --from=/mnt/backup/graph.db.1 --database=graph.db --force
start instances: sudo systemctl restart neo4j

andrew_bowman · ‎09-24-2018

The restore process looks correct.

It's hard to troubleshoot this without knowing more about what's going on in the logs.

You may also want to make sure permissions are correct throughout the restored graph.db, as that could could cause a problem when attempting to start up. If you're able to start up successfully with dbms.mode=SINGLE, then that implicates some cluster-specific issue. If you can't start up in single mode, then it's likely something more basic, such as a permission issue or something else (the logs might hold clues).

Neo4j

Copying a graph.db