Neo4j

rajsenthil · ‎10-31-2020

I am using helm charts to create a cluster using values with below

acceptLicenseAgreement: "yes"
neo4jPassword: "mysecret"

core:
standalone: false
numberOfServers: 2
persistentVolume:
## whether or not persistence is enabled
##
enabled: true
## core server data Persistent Volume mount root path
##
mountPath: /data

## core server data Persistent Volume size
##
size: 250Mi
discoveryService:
type: ClusterIP
annotations: {}
labels: {}
loadBalancerSourceRanges:
# Controls how many services get created. Usually want to over-provision so cores can
# scale up for things like rolling upgrades.
instances: [0, 1]
standaloneOnly: [0]

readReplica:
numberOfServers: 0

I could see that the kubernetes services are created service/discovery-neo4j-neo4j-0, service/discovery-neo4j-neo4j-1 and using the ports 5000/TCP,6000/TCP,7000/TCP,3637/TCP.

The pods pod/neo4j-neo4j-core-0 and pod/neo4j-neo4j-core-1 are not running and waiting with the message
2020-10-31 16:43:19.091+0000 INFO Database 'system' is waiting for a total of 3 core members...

I checked the neo4j conf file of this pod and pasted below...

causal_clustering.transaction_advertised_address=discovery-neo4j-neo4j-0.dev-namespace.svc.cluster.local:6000
causal_clustering.raft_advertised_address=discovery-neo4j-neo4j-0.dev-namespace.svc.cluster.local:7000
causal_clustering.minimum_core_cluster_size_at_runtime=2
causal_clustering.minimum_core_cluster_size_at_formation=3
causal_clustering.kubernetes.service_port_name=tcp-discovery
causal_clustering.kubernetes.label_selector=neo4j.com/cluster=neo4j-neo4j,neo4j.com/role=CORE,neo4j.com/coreindex in (0, 1, 2)
causal_clustering.discovery_type=K8S
causal_clustering.discovery_advertised_address=discovery-neo4j-neo4j-0.dev-namespace.svc.cluster.local:5000

Any idea why the pods are not getting started and could not resolve the service?

terryfranklin82 · ‎10-31-2020

Isn't that the same single pod?

It's tough to diagnose this, but by chance have you tried deploying multiple times? E.g. why is your service called discovery-neo4j-neo4j-2, was there a discovery-neo4j-neo4j-0 and/or discovery-neo4j-neo4j-1?

Separately, I would increase your number of servers to 3, to give you at least some fault tolerance.

View solution in original post

rajsenthil · ‎10-31-2020

In the log, I could see that service is not reachable though it is running and listening on the port

2020-10-31 15:37:31.442+0000 WARN [a.s.Materializer] [outbound connection to [akka://cc-discovery-actor-system@discovery-neo4j-neo4j-2.dev-namespace.svc.cluster.local:5000], message stream] Upstream failed,
 cause: StreamTcpException: Tcp command [Connect(discovery-neo4j-neo4j-2.dev-namespace.svc.cluster.local:5000,None,List(),Some(10000 milliseconds),true)] failed because of java.net.UnknownHostException: dis
covery-neo4j-neo4j-2.dev-namespace.svc.cluster.local

rajsenthil · ‎10-31-2020

The service details are

kubectl describe service/discovery-neo4j-neo4j-2
Name:              discovery-neo4j-neo4j-2
Namespace:         dev-namespace
Labels:            app.kubernetes.io/component=core
                   app.kubernetes.io/instance=neo4j
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=neo4j
                   helm.sh/chart=neo4j-4.1.3-1
                   neo4j.com/bolt=false
                   neo4j.com/cluster=neo4j-neo4j
                   neo4j.com/coreindex=2
                   neo4j.com/http=false
                   neo4j.com/role=CORE
Annotations:       meta.helm.sh/release-name: neo4j
                   meta.helm.sh/release-namespace: dev-namespace
Selector:          statefulset.kubernetes.io/pod-name=neo4j-neo4j-core-2
Type:              ClusterIP
IP:                None
Port:              tcp-discovery  5000/TCP
TargetPort:        5000/TCP
Endpoints:         <none>
Port:              tcp-transaction  6000/TCP
TargetPort:        6000/TCP
Endpoints:         <none>
Port:              tcp-raft  7000/TCP
TargetPort:        7000/TCP
Endpoints:         <none>
Port:              tcp-jmx  3637/TCP
TargetPort:        3637/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>

terryfranklin82 · ‎10-31-2020

Isn't that the same single pod?

It's tough to diagnose this, but by chance have you tried deploying multiple times? E.g. why is your service called discovery-neo4j-neo4j-2, was there a discovery-neo4j-neo4j-0 and/or discovery-neo4j-neo4j-1?

Separately, I would increase your number of servers to 3, to give you at least some fault tolerance.

rajsenthil · ‎11-01-2020

It was a typo of getting pod/neo4j-neo4j-core-0 repeated twice.
Also I increased the numberOfServers to 3 and it all starting working now. Thank you for the suggestion. I marked the reply as Solution

EarthlingDavey · ‎06-09-2021

I think the cause is: one of the common-configmap.yaml values is NEO4J_causal__clustering_minimum__core__cluster__size__at__formation: "3"

In case you want to keep number of servers to 2. e.g. in a development environment then try this.

apiVersion: v1
kind: ConfigMap
metadata:
  name: neo4j-cm
  namespace: neo4j
data:
  NEO4J_causal__clustering_minimum__core__cluster__size__at__formation: "2"

and

values:
  envFrom:
    - configMapRef:
      name: neo4j-cm
  core:
    numberOfServers: 2

Neo4j

Helm charts to create Neo4j cluster