cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Causal cluster Discovery port listen 0.0.0.0:5000 doesn't work

Hello.
I’m facing a pretty critical issue and I’m pretty sure that this is a Neo4J bug.
There is already existing github issue for it https://github.com/neo4j/neo4j/issues/12221, but I added more finds from my setup: https://github.com/neo4j/neo4j/issues/12221#issuecomment-505883604

Long story short:
I’m trying to do a Neo4J Causal Cluster setup based on the AWS ECS with awsvpc networking. So awsvpc networking provides a separate network interface (with the separate IP from the VPC) mounted into Docker Image.

Here are the problems I’m facing:

  1. Setting causal_clustering.discovery_listen_address doesn’t work properly. When it is set to 0.0.0.0:5000 the connection info in logs says the expected info

    Discovery:   listen=0.0.0.0:5000, advertised=10.10.1.128:5000
    

    but on practice, running lsof -i tcp | grep neo | grep LISTEN displays, that while other ports listens properly on *:7000 or *:6000 and the Discovery port still being bound to the IP (as stated in the issue), for example:

    java      6 neo4j  233u  IPv4 427591      0t0  TCP 169.254.172.28:5000 (LISTEN)
    
  2. When Discovery port not being bound properly to the *:5000 looks like it is being bound to the random available network interface. In my case ECS containers has two interfaces (excluding loopback):

    Interface ecs-eth0: 
         address: 169.254.172.x
    

    and

    Interface eth0: 
          address: 10.10.x.x
    

    (this data coming from the debug.log file), where the ecs-eth0 is some internal ECS interface I don’t care about and the eth0 is the one that should handle communication. The problem is, when neo4j binds the listen port to eth0 - everything works fine, when ecs-eth0 - port 5000 is unreachable and node can’t join the cluster. And this happens at complete RANDOM, see more logs in the github issue comment https://github.com/neo4j/neo4j/issues/12221#issuecomment-505940194

  3. Setting causal_clustering.discovery_listen_address=10.10.x.x:5000 doesn’t work as well . Even when I can see correct bind info from the lsof:

    TCP ip-10-10-1-72.ec2.internal:5000 (LISTEN)
    

    and in the connection info log

    Discovery:   listen=10.10.1.72:5000, advertised=10.10.1.72:5000
    

    it still doesn’t work (nodes simply not discovering each other, even if I can access port 5000 with nc) and it is a complete mystery to me why.

Thanks!

0 REPLIES 0