Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-23-2019 02:21 PM
Hi Everyone! I'm having this little issue with a cluster I'm trying to create with 3 instances. The cluster seems to be created correctly in the logs, the neo4j processes are running but none of the members accept http or bolt connections.
This is the last entry on the logs from all the members (changing the ips):
2019-05-23 21:00:12.413+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Cluster discovery service starting
2019-05-23 21:00:12.438+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] My connection info: [
Discovery: listen=0.0.0.0:5000, advertised=172.31.0.78:5000,
Transaction: listen=0.0.0.0:6000, advertised=172.31.0.78:6000,
Raft: listen=0.0.0.0:7000, advertised=172.31.0.78:7000,
Client Connector Addresses: bolt://172.31.0.78:7687,http://172.31.0.78:7474,https://172.31.0.78:7473
]
2019-05-23 21:00:12.438+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Discovering other core members in initial members set: [172.31.0.76:5000, 172.31.0.77:5000, 172.31.0.78:5000]
2019-05-23 21:00:12.482+0000 INFO [o.n.c.c.c.l.s.SegmentedRaftLog] log started with recovered state State{prevIndex=-1, prevTerm=-1, appendIndex=-1}
2019-05-23 21:00:12.482+0000 INFO [o.n.c.c.c.m.RaftMembershipManager] Membership state before recovery: RaftMembershipState{committed=null, appended=null, ordinal=-1}
2019-05-23 21:00:12.483+0000 INFO [o.n.c.c.c.m.RaftMembershipManager] Recovering from: -1 to: -1
2019-05-23 21:00:12.484+0000 INFO [o.n.c.c.c.m.RaftMembershipManager] Membership state after recovery: RaftMembershipState{committed=null, appended=null, ordinal=-1}
2019-05-23 21:00:12.484+0000 INFO [o.n.c.c.c.m.RaftMembershipManager] Target membership: []
2019-05-23 21:00:12.557+0000 INFO [o.n.c.n.Server] raft-server: bound to 0.0.0.0:7000
2019-05-23 21:00:21.146+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Cluster discovery service started
2019-05-23 21:00:21.173+0000 INFO [o.n.c.d.CoreMonitor] Bound to cluster with id f2123d79-01c7-4bdf-a118-b96ebe5dc762
2019-05-23 21:00:21.320+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Core topology changed {added=[{memberId=MemberId{16939e8d}, info=CoreServerInfo{raftServer=172.31.0.76:7000, catchupServer=172.31.0.76:6000, clientConnectorAddresses=bolt://172.31.0.76:7687,http://172.31.0.76:7474,https://172.31.0.76:7473, groups=[], database=default, refuseToBeLeader=false}}, {memberId=MemberId{3c8b291f}, info=CoreServerInfo{raftServer=172.31.0.78:7000, catchupServer=172.31.0.78:6000, clientConnectorAddresses=bolt://172.31.0.78:7687,http://172.31.0.78:7474,https://172.31.0.78:7473, groups=[], database=default, refuseToBeLeader=false}}], removed=[]}
2019-05-23 21:00:21.320+0000 INFO [o.n.c.c.c.m.RaftMembershipManager] Target membership: [MemberId{16939e8d}, MemberId{3c8b291f}]
2019-05-23 21:00:21.328+0000 INFO [o.n.c.d.CoreMonitor] Discovered core member at 172.31.0.76:5000
2019-05-23 21:00:26.074+0000 INFO [o.n.c.d.CoreMonitor] Discovered core member at 172.31.0.77:5000
2019-05-23 21:00:26.077+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Core topology changed {added=[{memberId=MemberId{1a32017e}, info=CoreServerInfo{raftServer=172.31.0.77:7000, catchupServer=172.31.0.77:6000, clientConnectorAddresses=bolt://172.31.0.77:7687,http://172.31.0.77:7474,https://172.31.0.77:7473, groups=[], database=default, refuseToBeLeader=false}}], removed=[]}
2019-05-23 21:00:26.077+0000 INFO [o.n.c.c.c.m.RaftMembershipManager] Target membership: [MemberId{16939e8d}, MemberId{1a32017e}, MemberId{3c8b291f}]
These are the cluster configs:
dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address=172.31.0.78
dbms.mode=CORE
causal_clustering.minimum_core_cluster_size_at_formation=3
causal_clustering.minimum_core_cluster_size_at_runtime=3
causal_clustering.initial_discovery_members=172.31.0.76:5000,172.31.0.77:5000,172.31.0.78:5000
When I run cypher-shell in any of the instances:
# cypher-shell
Connection refused
Neo4j status in all the members says it's not running:
# neo4j status
Neo4j is not running
And yet I can see in all the instances that neo4j IS running. What else can I check?
Thanks! Any help is appreciated!
05-24-2019 03:27 PM
Hi there,
From one member of the cluster, are you able to telnet to another cluster member on the bolt port?
Note: this won't be usable, but you'll see if you can make the connection.
ie,
telnet 172.31.0.78 7687
Does that connect?
I'm trying to see if you have a firewall on each machine which is blocking outside connections? It certainly appears to be able to connect to port 5000 on each machine.
Cheers,
-Ryan
05-26-2019 03:08 AM
Hi Ryan, I wasn't able to telnet on the bolt port, but not because there was a firewall blocking the requests, but because there was no service running on that port.
This leads me to believe that somehow neo4j is not listening on the bolt port. I've tried restarting but still no luck.
Any idea where I could look next?
Thanks!
05-27-2019 07:21 AM
Hi Cesar,
The instances won't start accepting requests (listening on the BOLT port) until all the clustering communication is working and the cluster has formed. I suspect that there is some networking, or configuration issues not allowing the cluster to form and communicate correctly. Can you attach/post the debug.log from all three instances?
Kind Regards,
Dave
05-28-2019 02:40 AM
Hi David, these are the debug.log files for the 3 instances.
https://pastebin.com/z4QUiiYE
https://pastebin.com/nh3q2PjC
https://pastebin.com/4bTmD1pi
I think for now I'll just try to recreate the whole cluster from the beginning, re-importing data and all. It's a 1.3 TB database, so it's going to take a while
Thanks for taking the time to look into this.
Cesar
10-25-2019 02:50 AM
Hi Cesar,
Can you please share your active neo4j.conf entries as I am facing same issues. Cluster formation completed but http and bold not spawn.
java 163405 neo4j 295u IPv4 896640 0t0 TCP xd1c:5000 (LISTEN)
java 163405 neo4j 346u IPv4 896659 0t0 TCP xd1c:7000 (LISTEN)
java 163405 neo4j 368u IPv4 895767 0t0 TCP xd1c:56647->xd1c4834665n4ja:5000 (ESTABLISHED)
java 163405 neo4j 369u IPv4 895864 0t0 TCP xd1c4834665n4jb:5000->xd1c4834665n4jc:52004 (ESTABLISHED)
Expecting port 7474 to be "LISTEN".
10-25-2019 07:29 AM
Hi David, I'm really sorry I can't help you, we are not using clustering any more and this was so long ago that I can't remember exactly how I solved the problem. I do remember I solved it and I think it was related to the Firewall and how it was necessary to open more than just one port. There were like 3 or 4 additional ports needed to be open for the clustering to work. But don't take my word for it
12-04-2019 06:38 AM
Hi Cesar
The issue has been long resolved, main issue was that the cluster node graph db were not in sync from initial setup. thanks for your reply.
All the sessions of the conference are now available online