Neo4j

Dutzu · ‎03-07-2019

Hello,

It has come to this point. How to handle multi-tenancy.

Obvious choices:

single-tenant - 1 instance per tenant (would get hairy once the number of tenants grows and we have data that should be shared between tenants, and which would need effort to keep in sync on all instances)
Tenant Label - tedious and error prone
Referenced Tenant node - perhaps a bit simpler to get head around but still tedious and error prone

From what I saw in other threads, there is already multi-tenancy on the roadmap, but it's probably still at least half a year away: Proper way to implement multi-tenancy on Neo4j

Another set of approaches is documented pretty nicely in this article: https://www.experoinc.com/post/multi-tenant-applications-in-neo4j

And i've also seen references that if we would have used the Java OGM, or with ruby with Neo4j.rb (http://blog.vivekprahlad.com/multitenancy-with-neojrb/) or with Gremlin instead of Cypher via PartitionStrategy (http://tinkerpop.apache.org/docs/3.1.0-incubating/#_partitionstrategy) we could have achieved multi-tenancy.

Well, too little too late for us now. We have 4 NodeJS api's that each communicate with Neo4j.

So, does anyone have a tip for us on how to approach this? Should we wait it out until Neo4j 4.0 is launched and until then deal with 1 db per tenant?

Is there any other "trick" we could use?

I was thinking even of something like, authentication with different credentials per tenant, and a Trigger or something in Neo4j that would filter results depending on the user making the call.

From what I understood, there is the possibility to enforce this via Subgraph access control (https://neo4j.com/docs/operations-manual/current/authentication-authorization/subgraph-access-contro...)

Is it worth the hassle or is there a simpler, better way?

Thank you,
Doru

david_allen · ‎03-07-2019

It sounds to me like you've done your research and you have an accurate picture of the space -- and you're aware of most of the main options.

Note that in multi-tenancy setups, you're always creating some separation between graphs the question is really just at what level. You can separate them on the label level, at the graph level within a single database (that's the feature that is coming in the next version of Neo4j) and you can separate them at a physical level by putting them in different databases.

I think we've seen all of those approaches, each according to how high a level of guaranteed separation you need. Physical separation makes it maximally difficult or impossible that software errors in clients could access the wrong data, while label or subgraph access control does very well. So which you pick kinda depends on what level of assurances you need and how sensitive your data is. In regulated environments for example, often nothing less than physical separation will do, in part because even the administrative folks behind the scenes need to be locked out of datasets they shouldn't see.

There's no "trick" per se. Only choices & tradeoffs. The easiest/simplest way with the lowest level of assurance is to apply a label for each graph to every node in that graph, and then ensure with your client software that all of your queries always constrain what they're looking for to that label at a minimum. E.g. you can have a :Graph1:Client and a :Graph2:Client but you never query for a :Client.

The most complex/difficult method (but with the highest level of security assurance) will always be the physical separation. Everything else can be thought of as a midpoint on that continuum.

The key question is how much separation you need for your multi-tenancy and what you're willing to adopt to get it.

Dutzu · ‎03-20-2019

Hello,

We decided to opt for the complete separation of data. So we will have 1 instance per tenant, until you guys release the multi-tenancy feature later this year. If the timeline has changed, please tell me.

We have multiple microservices that connect to neo and those microservices are of course horizontally scaled.

What are your recommendations regarding performance optimizations in this scenario. Connection pools, pool sizes, connection lifetime, etc.

Thank you

dhaks_r · ‎03-08-2019

Thanks for well researched question.

I too am looking for similar options, so cant solve your question.
But I am thinking of different options.

Can we have multiple databases active at same time in Neo4j? I came across below link. However I am not seeing any direct Neo4j documentation for us to proceed with below instructions. Any pointers appreciated

Dutzu · ‎01-14-2020

Follow-up

I think this post on another thread is relevant to the topic and might be of interest to whoever follows this thread: Multi-Tenancy on Neo4j

david_allen · ‎01-14-2020

@Dutzu that's an excellent point. Neo4j 4.0 will be available soon (the betas & milestone release candidates have been out for some time) and when 4.0 is available, multidatabase will be available to the issue in this thread will be solved a new way.

For all those coming to this thread -- note that the advice above was only current as of March 2019. January 2020 says hello, and there are soon to be different answers available.

Neo4j

Multi-tenancy

Follow-up