Neo4j

deemeetree · ‎05-08-2020

I'm trying to remotely back up my Neo4J database for already 2 days and nothing works.

I run

sudo neo4j-admin backup --backup-dir=backup --name=graph.db-backup --from=1.1.1.1:1111 --timeout=50m

The files start to get saved and then simply erased after a while and there's no backup.

I tried setting up --pagecache=16M and HEAP_SIZE but it has no effect. Sometime it just stalls sometimes I get an error like this:

unexpected error: java.io.IOException: org.neo4j.com.ComException: Channel has been closed

The DB I'm backing up is Enterprise 3.3.3 and the one I'm backing up with is 3.5.14

Thank you for any help.

Is this the normal Neo4J behavior?

david_allen · ‎05-08-2020

The two notable things that you're mentioning here are

You're backing up Neo4j 3.3 with Neo4j 3.5 tools. I believe there were some store upgrade changes between these versions and so I would not advise that....have you tried using 3.3 tooling?
The concrete error that you've provided suggest there's a network interruption that's happening some place. It's tough to see what's happening without a full paste of the output of the command, and some knowledge of what's happening on the network between you and the database.

deemeetree · ‎05-11-2020

Ok, I'm trying with 3.3 and I get this error now:

command failed: Backup failed: Unexpected Exception

How can I find out what's happening really? Is there a log file for the neo4j-admin backup command?

deemeetree · ‎05-11-2020

And then if that doesn't happen if the backup runs until the end, then automatically it erases everything and nothing happens — the process stalls.

Honestly, I'm dealing with it for already 5 days. Is it supposed to be that hard to do a backup?

deemeetree · ‎05-11-2020

Just to clarify once again: the files are being copied to the temp-copy folder inside the folder (backup) where I am making a backup to. But either there is an error during the backup, or if the backup is done (judging by the size of the temp-copy, I guess at the stage where it's supposed to finalize the "write" it just erases everything and the process stalls everything disappears.

I'm doing a backup from a WebFaction server (production) to a Neo4J db hosted on AWS.

When I look at the log file in /var/log/neo4j/ there's no information there (just that the database has launched or not) and there is no other log I can access or I don't know where it is.

Could it be an issue with permissions of the folder where the backup is made?

Could it be I have to run neo4j-admin backup using systemctl?

Could you please provide some help on this because this topic is not so well documented in your manual and I think it's very important.

deemeetree · ‎05-11-2020

I tried it from another machine, locally, and it can go further but then gives us this error:

2020-05-11 19:20:23.480+0000 INFO [o.n.c.s.StoreCopyClient] Copying index/lucene/relationship/TO/_qmb8.si
2020-05-11 19:20:23.482+0000 INFO [o.n.c.s.StoreCopyClient] Copied index/lucene/relationship/TO/_qmb8.si 427.00 B
2020-05-11 19:20:23.482+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore
2020-05-11 19:20:23.483+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore 16.00 kB
2020-05-11 19:20:23.483+0000 INFO [o.n.c.s.StoreCopyClient] Done, copied 637 files
2020-05-11 19:20:23.590+0000 INFO [o.n.b.BackupService] Start receiving transactions from 3794881
2020-05-11 19:20:32.540+0000 INFO [o.n.b.BackupService] Finish receiving transactions at 3794881
2020-05-11 19:20:32.572+0000 INFO [o.n.b.BackupService] Start recovering store
command failed: Backup failed: Error starting org.neo4j.com.storecopy.ExternallyManagedPageCache$GraphDatabaseFactoryWithPageCacheFactory$1, /Volumes/Extreme SSD/Backup/main-graph.db-backup/temp-copy

jggomez · ‎05-11-2020

Hi, you could try with this program locally... I developed that utility..

https://github.com/jggomez/neo4j-backup.

I hope can help you

deemeetree · ‎05-12-2020

Yes, thank you, @jggomez, I saw this, but I don't want it to run locally plus it's using the same command neo4j-admin backup inside your script, and that is not working for me.

I also want to get a conclusive answer from Neo4J engineers: does the backup, which is a feature (apart from clustering) setting the Enterprise version apart from Community, actually work? Or only in some cases and sometimes? And the 3 pages of documentation that exist on it is all there is to understand how it works?

I'm not new to Neo4J but these 5 days I'm trying to make this simple task of online backup work is the longest stretch I've ever had so far with this technology and my experience is that it's super buggy and unreliable with not enough options and insufficiently documented too.

deemeetree · ‎05-12-2020

Now, even if it happens (1 in 10 times) that the process goes to its completion, at the stage where I am at

2020-05-12 15:25:44.046+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore
2020-05-12 15:25:44.048+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore 16.00 kB
2020-05-12 15:25:44.049+0000 INFO [o.n.c.s.StoreCopyClient] Done, copied 642 files
2020-05-12 15:25:44.202+0000 INFO [o.n.b.BackupService] Start receiving transactions from 3796108
2020-05-12 15:25:46.884+0000 INFO [o.n.b.BackupService] Finish receiving transactions at 3796108
2020-05-12 15:25:46.920+0000 INFO [o.n.b.BackupService] Start recovering store

I get this error after:

command failed: Backup failed: Error starting org.neo4j.com.storecopy.ExternallyManagedPageCache$GraphDatabaseFactoryWithPageCacheFactory$1

I saw a post about it on https://github.com/neo4j/neo4j/issues/11992 and changed the max open files on my system, but that didn't help either.

The last records in the log on the server I am backing up are:

2020-05-12 15:25:42.811+0000 INFO [o.n.k.i.s.c.CountsTracker] About to rotate counts store at transaction 3796320 to [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.b], from [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.a].
2020-05-12 15:25:42.819+0000 INFO [o.n.k.i.s.c.CountsTracker] Successfully rotated counts store at transaction 3796320 to [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.b], from [/home/neo4j-enterprise-3.3.3/data/databases/graph.db/neostore.counts.db.a].
2020-05-12 15:25:47.723+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]:  Store flush completed
2020-05-12 15:25:47.723+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]:  Starting appending check point entry into the tx log...
2020-05-12 15:25:47.724+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]:  Appending check point entry into the tx log completed
2020-05-12 15:25:47.725+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3796320]:  Check pointing completed
2020-05-12 15:25:47.725+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [898]:  Starting log pruning.
2020-05-12 15:25:47.728+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [898]:  Log pruning complete.
2020-05-12 15:30:25.788+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 204ms.

The local backup log (where I'm backing up to) doesn't have any errors.

leo_szumel · ‎05-12-2020

@deemeetree it might be worth double-checking the file limit increase is in effect. IIRC we edited the neo4j-admin script to print ulimit -a. Also note that 65535 has not always been sufficient for us. I would try a much larger value to rule that out as a root cause.

deemeetree · ‎05-15-2020

Somebody from Neo4J — could you please respond and advise?

elaine_rosenber · ‎05-15-2020

Can you set debug level for logging and then provide the log file(s)?

dbms.logs.debug.level=DEBUG

And set the env variable NEO4j_DEBUG to true

If you don't want to provide the log file(s) here, you can send them to the Intercom ticket you opened for this case also.

Elaine

deemeetree · ‎05-16-2020

Hi Elaine,

Thank you for responding. I'm actually communicating with you through DM on Twitter but I guess you are receiving it through Intercom, right?

Could you please tell me if I need to set this up as you advised above on the database I am backing up or on the remote system I'm using to do the backup? Or on both?

Thanks

elaine_rosenber · ‎05-18-2020

To be sure, you are using the same version of Neo4j on the system you are backing up and the system where you are running neo4j-admin backup from correct?

Since you are saying that it appears to do the backup and then the files disappear, I would say that what you need to look at is debugging the system from where. you are executing the backup command from. The server doesn't seem to be the problem.

That being said, you do not have a log file or debug log file on the system from where you are running neo4j-admin backup so perhaps setting the env variable will help you, but changing anything in the neo4j.conf will not as you do not use a local Neo4j instance to perform the backup.

Does the server that you want to back up need to be online 24x7? Another option you could try is to shut down the server that. you want to back up and try neo4j-admin dump to at least get a dump file for the database.

Elaine

deemeetree · ‎05-19-2020

Hello Elaine,

The whole point of me switching to the enterprise version was to be able to do online backups. So I want to be able to do those.

Regarding the backup files — I already provided all the data from all the sources (both remote and local) above.

It looks like the backup feature in Neo4J Enterprise works really badly and is super unstable.

I guess I should just switch back to Community and do offline backups as before, right?

elaine_rosenber · ‎05-19-2020

For now, can you back up on the same system as the server just to make sure that the backup works locally? Then copy the backup files to a different system. This will at least enable you to backup your database without any interruption of service.

Elaine

deemeetree · ‎05-25-2020

Also — is it possible to do an offline backup, then copy it to a remote location, and then do online incremental backups on that offline backup?

deemeetree · ‎05-25-2020

I can try to do it on the same system but do you know if it's going to slow down my app / database drastically comparing to remote backup? And how can I ensure it doesn't happen? Thanks!

deemeetree · ‎05-25-2020

Also, I'm doing it locally and it just crashes my database with a message

command failed: Backup failed: Unexpected Exception

elaine_rosenber · ‎05-26-2020

If the database is crashing, you need to investigate why. I suggest that for starters, you take the database down and dump it. Then perform a consistency check on it.

There is definitely something wrong here that needs to be investigated.

Elaine

deemeetree · ‎05-29-2020

I made the dump of the DB and then copied it to the remove server, so I can make incremental backup on it.

Launched the neo4j-admin backup script remotely, got the following response after:

Destination is not empty, doing incremental backup...
Doing consistency check...
2020-05-29 13:34:37.631+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_2[v0.A.8] record format from store /home/ubuntu/backup/graph.db
2020-05-29 13:34:37.631+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_2[v0.A.8]
2020-05-29 13:34:37.673+0000 INFO [o.n.m.MetricsExtension] Initiating metrics...
....................  10%
...Killed

The log on the backuped up db server:

2020-05-29 13:34:22.442+0000 INFO [o.n.b.BackupImpl] BackupServer:13462-1: Incremental backup started...
2020-05-29 13:34:22.446+0000 INFO [o.n.b.BackupImpl] BackupServer:13462-1: Incremental backup finished.

When I check consistency of the dump it seems ok:

Does this mean the backup was successfully done?

elaine_rosenber · ‎05-26-2020

Can you send the log file for this time-period where the local backup failed?

Elaine

deemeetree · ‎05-26-2020

It is exactly as the logs above... Just breaks and doesn't record anything...

elaine_rosenber · ‎05-26-2020

I would still like to see anything that was written to neo4j.log, debug.log, and the file you specified for the backup. It might also be useful to see what you modified in the neo4j.conf file. More information will help us to help you better.

Elaine

elaine_rosenber · ‎05-29-2020

So you successfully dumped the database to a file. That is great.

But... you still don't know if the database you dumped is consistent. You must run the consistency check on a database that is not started. You cannot do a consistency check on a dump file.

If that consistency check is successful, then you need to try to backup the database locally first to see if it can be backed up locally.

Elaine

deemeetree · ‎05-30-2020

Hello, the consistency check on the DB gives the following error:

2020-05-30 09:54:13.573+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_2[v0.A.8] record format from store /home/ubuntu/backup/graph.db
2020-05-30 09:54:13.577+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_2[v0.A.8]
2020-05-30 09:54:13.984+0000 INFO [o.n.m.MetricsExtension] Initiating metrics...
....................  10%
....................  20%
....................  30%
....................  40%
....................  50%
....................  60%
....................  70%
....................  80%
.......2020-05-30 10:40:27.039+0000 WARN [o.n.c.ConsistencyCheckService] Label index was not properly shutdown and rebuild is required.
	Label index: neostore.labelscanstore.db
2020-05-30 10:40:29.602+0000 WARN [o.n.c.ConsistencyCheckService] Index was not properly shutdown and rebuild is required.
	Index[ IndexRule[id=1, descriptor=Index( GENERAL, :label[0](property[0]) ), provider={key=lucene, version=1.0}] ]
2020-05-30 10:40:29.602+0000 WARN [o.n.c.ConsistencyCheckService] Index was not properly shutdown and rebuild is required.
	Index[ IndexRule[id=2, descriptor=Index( GENERAL, :label[0](property[1]) ), provider={key=lucene, version=1.0}] ]
2020-05-30 10:40:29.603+0000 WARN [o.n.c.ConsistencyCheckService] Index was not properly shutdown and rebuild is required.
	Index[ IndexRule[id=3, descriptor=Index( GENERAL, :label[1](property[0]) ), provider={key=lucene, version=1.0}] ]
2020-05-30 10:40:29.603+0000 WARN [o.n.c.ConsistencyCheckService] Index was not properly shutdown and rebuild is required.
	Index[ IndexRule[id=4, descriptor=Index( GENERAL, :label[1](property[1]) ), provider={key=lucene, version=1.0}] ]
2020-05-30 10:40:29.604+0000 WARN [o.n.c.ConsistencyCheckService] Index was not properly shutdown and rebuild is required.
	Index[ IndexRule[id=5, descriptor=Index( GENERAL, :label[2](property[0]) ), provider={key=lucene, version=1.0}] ]
2020-05-30 10:40:29.604+0000 WARN [o.n.c.ConsistencyCheckService] Index was not properly shutdown and rebuild is required.
	Index[ IndexRule[id=6, descriptor=Index( GENERAL, :label[2](property[1]) ), provider={key=lucene, version=1.0}] ]
2020-05-30 10:40:29.604+0000 WARN [o.n.c.ConsistencyCheckService] Index was not properly shutdown and rebuild is required.
	Index[ IndexRule[id=7, descriptor=Index( GENERAL, :label[3](property[0]) ), provider={key=lucene, version=1.0}] ]
2020-05-30 10:40:29.605+0000 WARN [o.n.c.ConsistencyCheckService] Index was not properly shutdown and rebuild is required.
	Index[ IndexRule[id=8, descriptor=Index( GENERAL, :label[3](property[1]) ), provider={key=lucene, version=1.0}] ]
2020-05-30 10:40:29.605+0000 WARN [o.n.c.ConsistencyCheckService] Index was not properly shutdown and rebuild is required.
	Index[ IndexRule[id=9, descriptor=Index( GENERAL, :label[0](property[15]) ), provider={key=lucene, version=1.0}] ]

However, it then goes on to redo the consistency check after this warning and reaches 100%

Do I need to fix those (and how?) and can this affect the online backups and why?

deemeetree · ‎05-30-2020

Ok, I reset and repopulated the broken indexes in the original database.

When I try to back it up remotely I'm again getting this kind of error:

2020-05-30 17:25:13.480+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db
command failed: Backup failed: Unexpected Exception

There is no record with that time / date neither in the backed up DB log nor in the remote location (where I'm backing up).

What to do?

elaine_rosenber · ‎06-01-2020

Did you perform the consistency check after you recreated the indexes?

To be safe, I would do a dump of the database after you have confirmed that it is consistent.

Then... once you bring the database online, can you do a LOCAL backup. I still wonder if there is something with your remote backup configuration.

Elaine

deemeetree · ‎06-01-2020

Hi @elaine.rosenberg

Should I perform the consistency check again after I recreated the indexes?

Then, as I understand, you suggest I do the offline backup (dump).

Then put the DB back online and do a local ONLINE backup to see if that works, right?

Regarding the settings for the remote backup, as I understand the only settings I can use are in the neo4j-admin command on the server where I am backing up to.

I currently set HEAP_SIZE=2G before and then in the actual neo4j-admin backup command:

--timeout=59m --pagecache=2G

The DB I want to back up is about 41Gb.

The full line I use to backup is:

sudo neo4j-admin backup --backup-dir=backup --name=graph.db --from=my.server.ip:port_address --timeout=59m --pagecache=4G

Of course, on the local DB that I am backing up, I set up in neo4j.conf the following settings:

dbms.memory.heap.initial_size=4024m
dbms.memory.heap.max_size=7300m

dbms.memory.pagecache.size=4g

dbms.backup.enabled=true

dbms.backup.address=0.0.0.0:port_address

Then, when I do the backup, I open that port_address on my server and everything works fine.

However, as I mentioned before, at some point the process stalls or quits (numerous errors above).

There is no trace of anything in the logs both on the server I'm backing up to or on the one I am backing up.

I have on both Neo4J Enterprise 3.3.3.

Is there anything I'm missing or all the settings look good to you?

Thank you!

elaine_rosenber · ‎06-01-2020

Definitely do a consistency check on the database before you back it up.

If that passes, then do the LOCAL backup.

dbms.backup.address=0.0.0.0:6362

Hopefully that "default" port of 6362 is available?

I would also make sure there are no files in the backup directory. You want the initial backup to occur as a complete backup.

Elaine

deemeetree · ‎06-01-2020

Hi Elaine,

I cannot make that "default" port available, so I'm using another port, however, I specified it in the settings (so it's not 6362 but another combination of digits). Is this a problem? I thought if the new port is specified in the settings, it should work, no?

deemeetree · ‎06-01-2020

Did you as you suggested, again run a consistency check (went fine) and did a local online backup.

neo4j-admin backup --backup-dir=backups --name=graph.db --from=localhost:port_number --timeout=59m --pagecache=2G

Went up to this and stopped there with a "killed" message, stopping here:

Doing full backup...
2020-06-01 22:24:32.947+0000 INFO [o.n.c.s.StoreCopyClient] Copying index.db
2020-06-01 22:24:32.983+0000 INFO [o.n.c.s.StoreCopyClient] Copied index.db 797.00 B
2020-06-01 22:24:32.983+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db.labels
2020-06-01 22:24:33.059+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.nodestore.db.labels 7.97 kB
2020-06-01 22:24:33.060+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db
2020-06-01 22:24:34.703+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.nodestore.db 144.63 MB
2020-06-01 22:24:34.704+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.index.keys
2020-06-01 22:24:34.711+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.index.keys 7.98 kB
2020-06-01 22:24:34.711+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.index
2020-06-01 22:24:34.713+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.index 8.00 kB
2020-06-01 22:24:34.716+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.strings
2020-06-01 22:24:50.685+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.strings 1.23 GB
2020-06-01 22:24:50.686+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.arrays
2020-06-01 22:24:50.696+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.arrays 8.00 kB
2020-06-01 22:24:50.696+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db
Killed

In the debug.log it says:

2020-06-01 22:24:29.346+0000 INFO [o.n.k.i.DiagnosticsManager]     - Total: 2018-04-17T07:42:55+0000 - 0.00 B
2020-06-01 22:24:29.346+0000 INFO [o.n.k.i.DiagnosticsManager]   - Total: 2018-04-17T07:42:55+0000 - 473.43 MB
2020-06-01 22:24:29.347+0000 INFO [o.n.k.i.DiagnosticsManager]   store_lock: 2018-04-17T07:44:17+0000 - 0.00 B
2020-06-01 22:24:29.347+0000 INFO [o.n.k.i.DiagnosticsManager] Storage summary:
2020-06-01 22:24:29.347+0000 INFO [o.n.k.i.DiagnosticsManager]   Total size of store: 45.68 GB
2020-06-01 22:24:29.347+0000 INFO [o.n.k.i.DiagnosticsManager]   Total size of mapped files: 36.13 GB
2020-06-01 22:24:29.347+0000 INFO [o.n.k.i.DiagnosticsManager] --- STARTED diagnostics for KernelDiagnostics:StoreFiles END ---
2020-06-01 22:24:31.482+0000 INFO [o.n.b.BackupImpl] BackupServer:13462-1: Full backup started...
2020-06-01 22:24:31.486+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Starting check pointing...
2020-06-01 22:24:31.486+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Starting store flush...
2020-06-01 22:24:31.607+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Store flush completed
2020-06-01 22:24:31.607+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Starting appending check point entr$
2020-06-01 22:24:31.611+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Appending check point entry into th$
2020-06-01 22:24:31.611+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Check pointing completed
2020-06-01 22:24:31.611+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [944]:  Starting log pruning.
2020-06-01 22:24:31.613+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [944]:  Log pruning complete.
2020-06-01 22:24:32.966+0000 INFO [o.n.k.i.DiagnosticsManager] --- SERVER STARTED START ---
2020-06-01 22:24:35.582+0000 INFO [o.n.k.i.DiagnosticsManager] --- SERVER STARTED END ---
2020-06-01 22:24:55.377+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 289ms.
2020-06-01 22:25:06.109+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 287ms.

So it looks like the DB restarted, right? But why did it do that? It was not accessible by the app, and the backup was the only action I was performing on it...

I then did another attempt.

This was my terminal output:

Doing full backup...
2020-06-01 23:20:25.453+0000 INFO [o.n.c.s.StoreCopyClient] Copying index.db
2020-06-01 23:20:25.515+0000 INFO [o.n.c.s.StoreCopyClient] Copied index.db 797.00 B
2020-06-01 23:20:25.515+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db.labels
2020-06-01 23:20:25.658+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.nodestore.db.labels 7.97 kB
2020-06-01 23:20:25.659+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db
2020-06-01 23:20:27.669+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.nodestore.db 144.63 MB
2020-06-01 23:20:27.673+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.index.keys
2020-06-01 23:20:27.677+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.index.keys 7.98 kB
2020-06-01 23:20:27.678+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.index
2020-06-01 23:20:27.685+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.index 8.00 kB
2020-06-01 23:20:27.693+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.strings
2020-06-01 23:20:41.915+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.strings 1.23 GB
2020-06-01 23:20:41.916+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db.arrays
2020-06-01 23:20:41.933+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db.arrays 8.00 kB
2020-06-01 23:20:41.933+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.propertystore.db
command failed: Backup failed: Unexpected Exception

Got these errors in the debug:

2020-06-01 23:20:23.325+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Starting check pointing...
2020-06-01 23:20:23.325+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Starting store flush...
2020-06-01 23:20:23.422+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Store flush completed
2020-06-01 23:20:23.422+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Starting appending check point entr$
2020-06-01 23:20:23.425+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Appending check point entry into th$
2020-06-01 23:20:23.425+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by full backup [3843881]:  Check pointing completed
2020-06-01 23:20:23.425+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [944]:  Starting log pruning.
2020-06-01 23:20:23.426+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [944]:  Log pruning complete.
2020-06-01 23:20:50.548+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 204ms.

Another fail....

It happens all the time!

elaine_rosenber · ‎06-01-2020

In the operations manual for later version of Neo4j, the backup listen address must be in the range: 6362-6372. Would you be able to use that range for a local backup?

Elaine

deemeetree · ‎06-01-2020

No, this is not possible to use that range.

I also don't see why this should be an issue?

If a different port is set in the config file it should work, no?

deemeetree · ‎06-01-2020

Also, I don't think the backup fails because of the port. First, it starts and continues always for a different period of time. Then when I was doing it remotely it was nearly going until the end.

deemeetree · ‎06-02-2020

@elaine.rosenberg as you suggested, I changed the port to 6362 (my hosting provider agreed to do that) and tried to run the backup again.

Fail again. This is what I see:

2020-06-02 10:33:22.540+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.propertystore.db 28.40 GB
2020-06-02 10:33:22.645+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.relationshipstore.db
command failed: Backup failed: Unexpected Exception

The log simply says:

2020-06-02 10:40:02.459+0000 WARN [o.n.k.i.c.MonitorGc] GC Monitor: Application threads blocked for 235ms.
2020-06-02 10:49:24.324+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3844003]:  Starting check poi$
2020-06-02 10:49:24.324+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3844003]:  Starting store flu$
2020-06-02 10:49:24.483+0000 INFO [o.n.k.i.s.c.CountsTracker] About to rotate counts store at transaction 3844003 to [/$
2020-06-02 10:49:24.490+0000 INFO [o.n.k.i.s.c.CountsTracker] Successfully rotated counts store at transaction 3844003 to [/$
2020-06-02 10:49:28.943+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3844003]:  Store flush comple$
2020-06-02 10:49:28.943+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3844003]:  Starting appending$
2020-06-02 10:49:28.944+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3844003]:  Appending check po$
2020-06-02 10:49:28.945+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [3844003]:  Check pointing com$
2020-06-02 10:49:28.945+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [944]:  Starting log pruning.
2020-06-02 10:49:28.947+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [944]:  Log pruning complete.

The database is now totally verified and checked for consistency.

There is definitely a problem with how backup in Neo4J is implemented.

I'm trying to do this already for about 3 weeks.

I tried it locally, remotely, using different hosting environments, doing it offline and then online — and the online backup is simply NOT working.

Shall we just call this thing off and admit that online backups in the commercial version of Neo4J simply don't work?

On my side I guess I will have to start looking for an alternative, like TigerGraph or something similar.

dana_canzano · ‎06-02-2020

@deemeetree
@elaine.rosenberg asked me to look into this.

We have a number of commercial customers who successfully run backup in production.
Is there a reason to use Neo4j 3.3? Yes it should work but 3.3 is not our most recent Neo4j release. Are you required to use 3.3 and not for example 3.5.x or 4.0.x .
Through the updates there is a comment of

The DB I'm backing up is Enterprise 3.3.3 and the one I'm backing up with is 3.5.14
Does this imply the database is in 3.5.14 format but you are using a 3.3.x neo4j-admin to perform the backup? If so is there a reason to backup 3.5.14 with 3.3.x and not simply use a 3.5.x neo4j-admin backup against a 3.5.x database.

Regarding setting of HEAP_SIZE under 3.3.x , there was a fix in 3.5 such that prior releases may not properly recognize this variable when a neo4j-admin command is run on the same instance as where a running Neo4j instance is running.

Can you return running backup but preface the neo4j-admin command with bash -x, i.e.

bash -x ./neo4j-admin backup .... .... .....

The inclusion of bash -x should provide more detail of when the java command is invoked to start backup and specifically I'm interested in a line of output that indicates or similar

+ exec /usr/bin/java -XX:+UseParallelGC -classpath '/home/neo4j/cluster/instance1/neo4j-enterprise-3.5.18/plugins:/home/neo4j/cluster/instance1/neo4j-enterprise-3.5.18/conf:/home/neo4j/cluster/instance1/neo4j-enterprise-3.5.18/lib/*:/home/neo4j/cluster/instance1/neo4j-enterprise-3.5.18/plugins/*' -Dfile.encoding=UTF-8 org.neo4j.commandline.admin.AdminTool backup --backup-dir=/tmp/

deemeetree · ‎06-07-2020

Well, I'm running it from 3.3 because I want to back it up first and to then update it to a higher version. I cannot use 4.0 because there are breaking changes (with apoc), so I'm stuck with 3.5 I guess.

I did what you recommended and ran it with the bash. Here is the output:

+ exec /usr/bin/java -classpath '/home/path/neo4j-enterprise-3.3.3/plugins:/home/path/neo4j-enterprise-3.3.3/conf:/home/path/neo4j-enterprise-3.3.3/lib/*:/home/path/neo4j-enterprise-3.3.3/plugins/*' -Dfile.encoding=UTF-8 org.neo4j.commandline.admin.AdminTool backup --backup-dir=backups --name=graph.db --from=localhost:6362 --timeout=59m --pagecache=1G

Right after that it says "Doing full backup..." (and failing, as always). Right before that line I have the following output:

++ /usr/bin/java -version
++ awk -F '"' '/version/ {print $2}'
+ JAVA_VERSION=1.8.0_252
+ [[ 1.8.0_252 < 1.8 ]]
+ /usr/bin/java -version
+ egrep -q '(Java HotSpot\(TM\)|OpenJDK|IBM) (64-Bit Server|Server|Client|J9) VM'
+ build_classpath
+ CLASSPATH='/home/path/neo4j-enterprise-3.3.3/plugins:/home/path/neo4j-enterprise-3.3.3/conf:/home/path/neo4j-enterprise-3.3.3/lib/*:/home/path/neo4j-enterprise-3.3.3/plugins/*'
+ EXTRA_JVM_ARGUMENTS=-Dfile.encoding=UTF-8
+ class_name=org.neo4j.commandline.admin.AdminTool
+ shift
+ export NEO4J_HOME NEO4J_CONF

dana_canzano · ‎06-07-2020

from your output of

exec /usr/bin/java -classpath '/home/path/neo4j-enterprise-3.3.3/plugins:/home/path/neo4j-enterprise-3.3.3/conf:/home/path/neo4j-enterprise-3.3.3/lib/*:/home/path/neo4j-enterprise-3.3.3/plugins/*' -Dfile.encoding=UTF-8 org.neo4j.commandline.admin.AdminTool backup --backup-dir=backups --name=graph.db --from=localhost:6362 --timeout=59m --pagecache=1G

we see --timeout=59m --pagecache=1G

and to which we see that min/max heap is not defined for it was we would see a reference to -Xms and -Xmx in the above line. So Java will simply default based upon the amount of free RAM when started.

How much total RAM is on the instance and how much is free?

Also with regards to --timeout=59m I'm not familiar with this arguement to java and how it is established. And if it is to timeout after 59 minutes I dont suspect this is in play but when I run backup I do not have it in my command line as it reports

+ exec /usr/bin/java -classpath '/home/neo4j/cluster/instance1/neo4j-enterprise-3.3.3/plugins:/home/neo4j/cluster/instance1/neo4j-enterprise-3.3.3/conf:/home/neo4j/cluster/instance1/neo4j-enterprise-3.3.3/lib/*:/home/neo4j/cluster/instance1/neo4j-enterprise-3.3.3/plugins/*' -Dfile.encoding=UTF-8 org.neo4j.commandline.admin.AdminTool backup --backup-dir=/tmp --name=graph.db

deemeetree · ‎06-10-2020

@dana.canzano — well the parameters timeout and pagecache are the neo4-admin params I found in your manual.

Shall I run it without?

RAM is 8G, but really used at every moment of time is about 4G.

So what should I do? How to run the backup?

jonomac · ‎01-17-2023

Hi @deemeetree resurrecting an old thread, but did you find out what was the core reason for your issues in the end?
Always useful to state a resolution for others who might end up stuck down the same hole. I'm currently attempting to backup 4.4 and migrate to 5.3.

Neo4j

Can't backup remote database using Neo4J