cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Experimental Multi-DC Causal Cluster with two Core - Bolt Routing issue

We are having an experimental Multi-DC Causal Cluster setup with two Cores.
For certain reasons LEADER is locked to be one of them, the other FOLLOWER.

Our policies:

causal_clustering.load_balancing.config.server_policies.usa=\
groups(us);
causal_clustering.load_balancing.config.server_policies.europe=\
groups(eu);

We want each machine to connect local instance whenever possible, and use remote instance otherwise.

We are using the node.js neo4j driver, server version is 3.5.8 on both machines.

We do not use transactions (explicitly yet), but we specify driver.session(neo4j.session.READ) for sessions that we use only for read.

This way, on the FOLLOWER, all READ-identified queries correctly go to the local FOLLOWER instance, while the unspecified queries go to the LEADER instance (again correctly).

The problem is on the LEADER instance, where apparently the READ queries are hitting the remote instance (so the FOLLOWER).

For now, I can circumvent the issue by disabling bolt routing on the LEADER instance, but we plan to scale this up properly, with more complicated nested regional policies, and I'm wondering if we'll have similar issues then as well.

The bolt URI we're using is
bolt+routing://<ip_of_local_instance>:7687?policy=usa
bolt+routing://<ip_of_local_instance>:7687?policy=europe
which apart from this problem I described, works like charm.

I went through all docs twice, I double-triple checked all our configs, the group assignments, the policy definitions, but couldn't identify any mistake.

We suspect that we are not understanding something fundamental about how Routing works.

Our routing table looks like this. 10.10.1.1 is the LEADER, 10.20.1.1 is the FOLLOWER

ttl server.role server.addresses
300 "WRITE" ["10.10.1.1:7687"]
300 "READ" ["10.20.1.1:7687"]
300 "ROUTE" ["10.20.1.1:7687", "10.10.1.1:7687"]

Thanks for any help from anyone, we are out of ideas.

1 ACCEPTED SOLUTION

The leader node will only be routed to for writeTransaction() queries. readTransaction() queries will only route to follower and read replica nodes in the cluster, so with only a 2-node cluster reads will only ever go to the single follower.

I think you would need a minimum of 3 nodes in the cluster to do what you want (an additional node deployed with your leader), ensuring the single node is configured to refuse to be leader.

View solution in original post

1 REPLY 1

The leader node will only be routed to for writeTransaction() queries. readTransaction() queries will only route to follower and read replica nodes in the cluster, so with only a 2-node cluster reads will only ever go to the single follower.

I think you would need a minimum of 3 nodes in the cluster to do what you want (an additional node deployed with your leader), ensuring the single node is configured to refuse to be leader.