cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Causal clustering plugins and write transactions

I am having a hard time figuring out how I can programatically detect whether I am in a single instance or clustering mode, and how I would transparently proxy queries done with GraphDatabaseService to the leader of a cluster.

Is it possible to let a core always redirect writes using the Neo4J java API (not a driver) from a plugin.

In reality what I need is read-only copies, but it says in the HA clustering section that HA clustering is somewhat depricated. Should I use it regardless?

11 REPLIES 11

Could you explain what you need this functionality for?

For redirection you'd still need to use the Java driver between instances.
Also the Java API itself doesn't support causal clusters, because you cannot determine upfront if you're running a read- or write-transaction.

I is not a problem if I have to specify the fact that it is a write operation, I'd just like to transparantly be able to execute a write query on any of the members of the core cluster.

Right now I just want read copies, but as far as I know, CA clusters need a minimum of 2 cores, and should really have at least 3. Therefore, the extra complexity of writing to the database is not something I am a big fan of.

What is the recommended way of setting up a main database (reads / writes) and have more read-only slaves ?
I do not need the guarantee of being able to write at all times, but I would like to have backups and the ability to offload some computationally heavy tasks to read-only slaves.

That's where/why you use a driver, with the bolt+routing protocol which is aware of the cluster topology and if you're using a read- or write-tx and routes appropriately (and even retries if the cluster changed topology during your operation).

It also takes care of load balancing between core and read-replica instances.

So when making plugins, one should never use GraphDatabaseService to do transactions, but instead use a driver?

Also, for the setup I described, does it make sense to use a CA cluster? Or is there some other way of getting read-replicas?

There's no need to transparently proxy things to the leader; this is done for you by a bolt+routing driver, as Michael says. But it is also OK to use a GraphDatabaseService to do transactions. I think the missing piece here is that if I did transactions inside of a GraphDatabaseService, I'd generally be only doing them on the leader, automatically, with no extra code needed, because whatever that code is would only be invoked on the leader.

For example: you write a stored procedure with the annotation @Procedure(name = "myPlugin.writeSomeStuff", mode = Mode.WRITE)

Inside of that procedure, you use a GraphDatabaseService to write some stuff, and then stream some results back. All good.

Now, that plugin is installed on all 3 nodes, but the procedure never gets called anywhere but the leader, because the client writes explicit write transactions (and autocommit) transactions to the leader. So cypher that calls the procedure in addition to bolt+routing basically takes this away so you don't have to worry about it.

If you did manually call that write procedure on a follower, it would fail -- because followers cannot accept writes. Fortunately if you set things up right, this just won't arise. Extra cores and read replicas scale out your read workload.

This would not be true for asyncronous procedures though.

And what about transaction listeners. If a transaction listener mutated the database on certain queries, wouldn't that pose a problem? The last issue is a hypothetical, but could be solved by doing a dbms.cluster.overview() and check the result, but it is overly complicated. There should be an easy way of telling whether or not you are the leader or follower.

This also makes it more complicated to use the neo4j browser, as you have to manually find the leader and then execute queries, which I think is a poor user experience. It requires knowledge about the underlying structure of the clustering, and even though the Neo4J instances may communicate on a network, does not mean all replicas are reachable for all clients.

To find whether a node is leader or follower, CALL dbms.cluster.role(); (Docs: https://neo4j.com/docs/operations-manual/current/monitoring/causal-cluster/procedures/#dbms.cluster....)

The neo4j browser too can accept bolt+routing as the address you connect to. By default, say it attempts to connect to bolt://my-cluster. If you instead connect to bolt+routing://my-cluster, then the cluster topology is then transparent to you. You can run both read and write queries, and the browser will route them wherever is appropriate.

On the transaction listener, I'm not sure. I may look into this.

Best to write your extensions as procedure then you can call them from Cypher and they are executed in the right context (read vs. write) and transaction.

If you want to check within a procedure what state the current instance has, you can use something like this:

I am creating causal cluster of 2 node as per the instructions of https://neo4j.com/docs/operations-manual/current/clustering/setup-new-cluster/ but i am getting error as

2019-02-07 06:24:32.198+0000 INFO ======== Neo4j 3.5.2 ========
2019-02-07 06:24:32.201+0000 INFO Starting...
2019-02-07 06:24:33.080+0000 INFO Initiating metrics...
2019-02-07 06:24:33.111+0000 INFO My connection info: [
Discovery: listen=172.31.38.35:5000, advertised=172.31.38.35:5000,
Transaction: listen=172.31.38.35:6000, advertised=172.31.38.35:6000,
Raft: listen=172.31.38.35:7000, advertised=172.31.38.35:7000,
Client Connector Addresses: bolt://172.31.38.35:7687,http://172.31.38.35:7474,https://172.31.38.35:7473
]
2019-02-07 06:24:33.111+0000 INFO Discovering other core members in initial members set: [172.31.38.24:5000, 172.31.38.35:5000]
2019-02-07 06:24:41.893+0000 INFO Bound to cluster with id 0e47df3c-4d53-4d27-86ef-7a3ca706be66
2019-02-07 06:24:41.911+0000 INFO Discovered core member at 172.31.38.24:5000
2019-02-07 06:24:55.744+0000 INFO Connected to /172.31.38.24:7000 [raft version:2]
2019-02-07 06:25:10.539+0000 INFO Waiting to hear from leader...
2019-02-07 06:25:38.541+0000 INFO Waiting to hear from leader...
2019-02-07 06:26:06.542+0000 INFO Waiting to hear from leader...
2019-02-07 06:26:34.543+0000 INFO Waiting to hear from leader...
2019-02-07 06:27:02.543+0000 INFO Waiting to hear from leader...
2019-02-07 06:27:30.544+0000 INFO Waiting to hear from leader...
2019-02-07 06:27:58.545+0000 INFO Waiting to hear from leader...
2019-02-07 06:28:26.546+0000 INFO Waiting to hear from leader...
2019-02-07 06:28:54.547+0000 INFO Waiting to hear from leader...
2019-02-07 06:29:22.548+0000 INFO Waiting to hear from leader...
2019-02-07 06:29:50.549+0000 INFO Waiting to hear from leader...
2019-02-07 06:30:18.550+0000 INFO Waiting to hear from leader...
2019-02-07 06:30:46.551+0000 INFO Waiting to hear from leader...
2019-02-07 06:31:14.551+0000 INFO Waiting to hear from leader...
2019-02-07 06:31:42.552+0000 INFO Waiting to hear from leader...
2019-02-07 06:32:10.554+0000 INFO Waiting to hear from leader...
2019-02-07 06:32:38.555+0000 INFO Waiting to hear from leader...
2019-02-07 06:33:06.555+0000 INFO Waiting to hear from leader...
2019-02-07 06:33:34.556+0000 INFO Waiting to hear from leader...
2019-02-07 06:34:02.557+0000 INFO Waiting to hear from leader...
2019-02-07 06:34:30.558+0000 INFO Waiting to hear from leader...
2019-02-07 06:34:50.673+0000 INFO Lost connection to /172.31.38.24:7000 [raft version:2]
2019-02-07 06:34:58.558+0000 INFO Waiting to hear from leader...
2019-02-07 06:35:26.559+0000 INFO Waiting to hear from leader...
2019-02-07 06:35:54.560+0000 INFO Waiting to hear from leader...
2019-02-07 06:36:22.560+0000 INFO Waiting to hear from leader...
2019-02-07 06:36:50.561+0000 INFO Waiting to hear from leader...
2019-02-07 06:37:18.562+0000 INFO Waiting to hear from leader...
2019-02-07 06:37:39.238+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@7a0ef219' was successfully initialized, but failed to start. Please see the attached cause exception "null". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@7a0ef219' was successfully initialized, but failed to start. Please see the attached cause exception "null".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@7a0ef219' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:184)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:123)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:90)
at com.neo4j.server.enterprise.CommercialEntryPoint.main(CommercialEntryPoint.java:22)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@7a0ef219' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:177)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory, /home/sakshi/neo4j-enterprise-3.5.2/data/databases
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:216)
at com.neo4j.causalclustering.core.CommercialCoreGraphDatabase.(CommercialCoreGraphDatabase.java:28)
at com.neo4j.server.database.CommercialGraphFactory.newGraphDatabase(CommercialGraphFactory.java:36)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:78)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle@5ec4ff02' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
... 9 more
Caused by: java.lang.RuntimeException: Server failed to join cluster within catchup time limit [600000 ms]
at org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle.start(MembershipWaiterLifecycle.java:55)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 11 more
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle.start(MembershipWaiterLifecycle.java:43)
... 12 more
2019-02-07 06:37:39.239+0000 INFO Neo4j Server shutdown initiated by request

please suggest me where i am going wrong

Please consider taking cluster formation issues to another thread, but briefly I should say that forming a cluster of 2 is not a good idea; clusters should have odd numbers of members, with 3 as a minimum. This allows HA guarantees on the data. In order to write you require a quorum of members to agree, and if one machine out of a 2 node cluster goes away, you cannot get a majority anymore and you'll have a read-only database.

Additionally -- neo4j has some configuration items that look for a minimum cluster size before forming, which for these reasons is typically 3.

Sir using three nodes for creating cluster ,getting similar error.

2019-02-08 06:45:25.715+0000 INFO Discovered core member at 172.31.38.37:5000
2019-02-08 06:45:38.780+0000 INFO Connected to /172.31.38.24:7000 [raft version:2]
2019-02-08 06:45:54.197+0000 INFO Waiting to hear from leader...
2019-02-08 06:45:58.325+0000 INFO Lost connection to /172.31.38.24:7000 [raft version:2]
2019-02-08 06:45:58.635+0000 WARN Lost core member at 172.31.38.24:5000
2019-02-08 06:46:22.198+0000 INFO Waiting to hear from leader...
2019-02-08 06:46:50.199+0000 INFO Waiting to hear from leader...
2019-02-08 06:47:08.624+0000 INFO Connected to /172.31.38.24:7000 [raft version:2]
2019-02-08 06:47:12.226+0000 INFO Discovered core member at 172.31.38.24:5000
2019-02-08 06:47:18.200+0000 INFO Waiting to hear from leader...
2019-02-08 06:47:23.515+0000 INFO Connected to /172.31.38.37:7000 [raft version:2]
2019-02-08 06:47:46.201+0000 INFO Waiting to hear from leader...
2019-02-08 06:48:14.201+0000 INFO Waiting to hear from leader...
2019-02-08 06:48:42.202+0000 INFO Waiting to hear from leader...
2019-02-08 06:49:10.203+0000 INFO Waiting to hear from leader...
2019-02-08 06:49:38.204+0000 INFO Waiting to hear from leader...
2019-02-08 06:50:06.204+0000 INFO Waiting to hear from leader...
2019-02-08 06:50:34.205+0000 INFO Waiting to hear from leader...
2019-02-08 06:51:02.206+0000 INFO Waiting to hear from leader...
2019-02-08 06:51:30.207+0000 INFO Waiting to hear from leader...
2019-02-08 06:51:58.208+0000 INFO Waiting to hear from leader...
2019-02-08 06:52:26.209+0000 INFO Waiting to hear from leader...
2019-02-08 06:52:54.210+0000 INFO Waiting to hear from leader...
2019-02-08 06:53:22.210+0000 INFO Waiting to hear from leader...
2019-02-08 06:53:50.211+0000 INFO Waiting to hear from leader...
2019-02-08 06:54:18.212+0000 INFO Waiting to hear from leader...
2019-02-08 06:54:46.213+0000 INFO Waiting to hear from leader...
2019-02-08 06:55:14.214+0000 INFO Waiting to hear from leader...
2019-02-08 06:55:33.394+0000 INFO Lost connection to /172.31.38.37:7000 [raft version:2]
2019-02-08 06:55:33.394+0000 INFO Lost connection to /172.31.38.24:7000 [raft version:2]
2019-02-08 06:55:42.215+0000 INFO Waiting to hear from leader...
2019-02-08 06:55:45.499+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e1d03d7' was successfully initialized, but failed to start. Please see the attached cause exception "null". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e1d03d7' was successfully initialized, but failed to start. Please see the attached cause exception "null".
org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e1d03d7' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:184)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:123)
at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:90)
at com.neo4j.server.enterprise.CommercialEntryPoint.main(CommercialEntryPoint.java:22)
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabase@5e1d03d7' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:177)
... 3 more
Caused by: java.lang.RuntimeException: Error starting org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory, /home/sakshi/neo4j-enterprise-3.5.2/data/databases
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:216)
at com.neo4j.causalclustering.core.CommercialCoreGraphDatabase.(CommercialCoreGraphDatabase.java:28)
at com.neo4j.server.database.CommercialGraphFactory.newGraphDatabase(CommercialGraphFactory.java:36)
at org.neo4j.server.database.LifecycleManagingDatabase.start(LifecycleManagingDatabase.java:78)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 5 more
Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle@5ec4ff02' was successfully initialized, but failed to start. Please see the attached cause exception "null".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:473)
at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
at org.neo4j.graphdb.facade.GraphDatabaseFacadeFactory.initFacade(GraphDatabaseFacadeFactory.java:211)
... 9 more
Caused by: java.lang.RuntimeException: Server failed to join cluster within catchup time limit [600000 ms]
at org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle.start(MembershipWaiterLifecycle.java:55)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:452)
... 11 more
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1771)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
at org.neo4j.causalclustering.core.consensus.membership.MembershipWaiterLifecycle.start(MembershipWaiterLifecycle.java:43)
... 12 more
2019-02-08 06:55:45.500+0000 INFO Neo4j Server shutdown initiated by request