Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-16-2020 04:33 AM
We have a cluster of 3 servers (normal config, leader is read/write, 2 followers are read).
We're having an issue of WRITE queries not completing, they're eventually returning an OOM exception.
The cypher (with labels and props obsfucated):
UNWIND ["01", "02", "03", "04", "05", "06", "07", "08", "09"] as pnum
MATCH (:Thing {thingUniqueNumber:pnum})-[r:REL_TO_DELETE]->()
RETURN pnum, r
returns in 13ms with 27 results.
Replacing the RETURN with DELETE r
carries on for hours till a log message is reported with oom (I'll update with the actual message if I can).
Possibly related, one of our followers wasn't picking up the heartbeat and getting replicated transactions from the leader after being restarted earlier in the week. The cluster_state folder was cleared as the recommended solution for that.
Thanks for any help!
-Mike French
01-16-2020 07:27 AM
A reboot of the cluster has resolved the issue. Further investigation returned this:
ERROR LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
What MAY have caused that....
There were 2 sessions, one adding labels and one removing a different label from the same subset of nodes. That may have caused lock contention.
01-16-2020 09:32 AM
Hello Mike,
Which version are you using.
Are you able to reproduce it?
01-16-2020 10:14 AM
enterprise version 3.5.5 and we haven't been able to reproduce it yet.
01-18-2020 10:10 PM
Can you try to update to the latest maintenance release available?
Did you notice some GC pause on any node of the cluster ?
you should have messages like this in in debug.log Detected VM stop-the-world
as mentionned in Fatal error occurred when handling a client connection causes crash
or
in neo4j.log : java.lang.OutOfMemoryError: Java heap space
All the sessions of the conference are now available online