Neo4j

szenyo · ‎04-12-2021

I would like to know if it is possible to handle few hundreds databases (or even more if it is possible) with multi database feature.
Anyone has experience with this?

I have done an experiment with default settings on my Mac, and I realised that after cca. 250 created databases inside one instance, the server started to struggle and throwing Too many open files exceptions, and the newly created databases went into FAILED state.
I tried to adjust the the number of max files and max files per process, but it is still failing after that amount of created database.

Any hints on this topic?

dana_canzano · ‎04-13-2021

couple of thoughts

a. how do you get to the mechanics/logistics of backing up and or restore of 250+ databases?
b. each database is going to consume disk storage to hold the graph itself (i.e. data/databases/) as well as the transaction logs (i.e. data/transactions/ ). So presumably you are going to need a fairly large file system
c. there is but one JVM with 1 heap and 1 pagecache to be shared amongst these NNN databases.
d. as it appears you may have encountered the user who starts the Neo4j JVM has a max open files parameter in linux. Our default is 40k. You could certainly increase but at some point you may simply run into a situation where max open files is at its absolute max.
e. if you have conf/neo4j.conf metrics.csv.enabled=true Configuration settings - Operations Manual and as some metrics are database centric in name, then if you have NNN databases for each N you would have N metric csv files. again more file system storage requirements.

Also, given you state this was configured on a Mac, which we wouldnt expect to see in production, some of the above may be different on a production grade server.

szenyo · ‎04-14-2021

Good points, thanks. I am about to find out if it is possible or not.

My findings so far:

I have to switch off metrics to be able to handle that amount of files
Yes, I know the production platform should be different.
Have to play with these params too:

dbms.jvm.additional=-XX:-MaxFDLimit
dbms.jvm.additional=-Dorg.neo4j.io.pagecache.implSingleFilePageSwapper.channelStripePower=2

But after three days of experimental, I have to say, it does not look a good idea. My original motivation was to solve the data separation issue, but this way it is quite limited, there are a lot of bottleneck with this setup. The number of manageable databases looks like around a dozen.

I think I will find another option, like handling this in the data modell.

Neo4j

Number of databases when using multi database feature