Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-22-2022 10:16 AM
Hi Folks,
I hope someone can point me in the right direction to find the root cause of this error. I have an Azure k8s instance of Neo4j, which I installed using helm with minimal custom config. Using azure disk like this:
Devs are using it for evaluation and today I just spent hours troubleshooting trying to get it up and running after the azureDisk was resized to 256GB by the devs.
The pod is in a crash loop doesn't start at all it fails with the following:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 24m default-scheduler Successfully assigned neo4j-ee-stn/neo4j-ee-stn-release-0 to aks-neo4j-33193140-vmss00002g
Warning Unhealthy 23m (x6 over 24m) kubelet Startup probe failed: dial tcp 10.244.0.15:7687: connect: connection refused
Normal Pulled 23m (x4 over 24m) kubelet Container image "neo4j:4.4.5-enterprise" already present on machine
Normal Created 23m (x4 over 24m) kubelet Created container neo4j
Normal Started 23m (x4 over 24m) kubelet Started container neo4j
Warning BackOff 4m16s (x100 over 24m) kubelet Back-off restarting failed container
SSL is disabled & I have also increased the startupProbe -> failureTreshhold explicitly;
Any ideas how to tackle this one please?
Thanks!
Emil
Solved! Go to Solution.
08-24-2022 11:39 PM - edited 08-25-2022 12:02 AM
Hi @TrevorS,
I appreciate your response.
It turned out the neo4j pod could not mount the newly resized disk that's why it was failing.
The startup probe was failing correctly, but I didn't even think the disk could be the issue (despite pod volume mount errors) until I tried to mount it on a VM and got more specific mount errors. I thought it could be something related more to the neo4j config, but I was wrong in my assumptions.
neo4j pod volume mount error:
Warning FailedMount 81s kubelet
Unable to attach or mount volumes:
unmounted volumes=[data], unattached volumes=[neo4j-conf data kube-api-access-fmwws]: timed out waiting for the condition
Trying to mount the azure disk under ubuntu:
mount: /neo4j: wrong fs type, bad option, bad superblock on /dev/sdd, missing codepage or helper program, or other error.
Trying to fix the disk under ubuntu shows:
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
/dev/sdd: recovering journal
fsck.ext2: unable to set superblock flags on /dev/sdd
/dev/sdd: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdd: ********** WARNING: Filesystem still has errors **********
I hope this helps someone else with similar issue, however I would be interested if some has experience with any disk tools that could help recover from this azure disk failure.
Obviously the solution was to change the azure disk and to run helm update ...
`Thanks again!
Emil
08-24-2022 12:00 PM
Hello @emil
I did some searching and found a similar OpenStack ticket regarding kubernetes errors similar to this.
Here is what I was able to find!
https://stackoverflow.com/questions/61303668/kubernetes-readiness-probe-failed-dial-tcp-10-244-0-105...
I hope this helps!
08-24-2022 11:39 PM - edited 08-25-2022 12:02 AM
Hi @TrevorS,
I appreciate your response.
It turned out the neo4j pod could not mount the newly resized disk that's why it was failing.
The startup probe was failing correctly, but I didn't even think the disk could be the issue (despite pod volume mount errors) until I tried to mount it on a VM and got more specific mount errors. I thought it could be something related more to the neo4j config, but I was wrong in my assumptions.
neo4j pod volume mount error:
Warning FailedMount 81s kubelet
Unable to attach or mount volumes:
unmounted volumes=[data], unattached volumes=[neo4j-conf data kube-api-access-fmwws]: timed out waiting for the condition
Trying to mount the azure disk under ubuntu:
mount: /neo4j: wrong fs type, bad option, bad superblock on /dev/sdd, missing codepage or helper program, or other error.
Trying to fix the disk under ubuntu shows:
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
/dev/sdd: recovering journal
fsck.ext2: unable to set superblock flags on /dev/sdd
/dev/sdd: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdd: ********** WARNING: Filesystem still has errors **********
I hope this helps someone else with similar issue, however I would be interested if some has experience with any disk tools that could help recover from this azure disk failure.
Obviously the solution was to change the azure disk and to run helm update ...
`Thanks again!
Emil
All the sessions of the conference are now available online