Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-26-2019 09:59 AM
Hello.
I’m facing a pretty critical issue and I’m pretty sure that this is a Neo4J bug.
There is already existing github issue for it https://github.com/neo4j/neo4j/issues/12221, but I added more finds from my setup: https://github.com/neo4j/neo4j/issues/12221#issuecomment-505883604
Long story short:
I’m trying to do a Neo4J Causal Cluster setup based on the AWS ECS with awsvpc
networking. So awsvpc
networking provides a separate network interface (with the separate IP from the VPC) mounted into Docker Image.
Here are the problems I’m facing:
Setting causal_clustering.discovery_listen_address
doesn’t work properly. When it is set to 0.0.0.0:5000
the connection info in logs says the expected info
Discovery: listen=0.0.0.0:5000, advertised=10.10.1.128:5000
but on practice, running lsof -i tcp | grep neo | grep LISTEN
displays, that while other ports listens properly on *:7000
or *:6000
and the Discovery port still being bound to the IP (as stated in the issue), for example:
java 6 neo4j 233u IPv4 427591 0t0 TCP 169.254.172.28:5000 (LISTEN)
When Discovery port not being bound properly to the *:5000
looks like it is being bound to the random available network interface. In my case ECS containers has two interfaces (excluding loopback):
Interface ecs-eth0:
address: 169.254.172.x
and
Interface eth0:
address: 10.10.x.x
(this data coming from the debug.log
file), where the ecs-eth0
is some internal ECS interface I don’t care about and the eth0
is the one that should handle communication. The problem is, when neo4j binds the listen port to eth0
- everything works fine, when ecs-eth0
- port 5000 is unreachable and node can’t join the cluster. And this happens at complete RANDOM, see more logs in the github issue comment https://github.com/neo4j/neo4j/issues/12221#issuecomment-505940194
Setting causal_clustering.discovery_listen_address=10.10.x.x:5000
doesn’t work as well . Even when I can see correct bind info from the lsof
:
TCP ip-10-10-1-72.ec2.internal:5000 (LISTEN)
and in the connection info log
Discovery: listen=10.10.1.72:5000, advertised=10.10.1.72:5000
it still doesn’t work (nodes simply not discovering each other, even if I can access port 5000 with nc
) and it is a complete mystery to me why.
Thanks!
All the sessions of the conference are now available online