Neo4j

mdfrenchman · ‎01-16-2020

We have a cluster of 3 servers (normal config, leader is read/write, 2 followers are read).

We're having an issue of WRITE queries not completing, they're eventually returning an OOM exception.

The cypher (with labels and props obsfucated):

UNWIND ["01", "02", "03", "04", "05", "06", "07", "08", "09"] as pnum
MATCH (:Thing {thingUniqueNumber:pnum})-[r:REL_TO_DELETE]->()
RETURN pnum, r

returns in 13ms with 27 results.

Replacing the RETURN with DELETE r carries on for hours till a log message is reported with oom (I'll update with the actual message if I can).

Possibly related, one of our followers wasn't picking up the heartbeat and getting replicated transactions from the leader after being restarted earlier in the week. The cluster_state folder was cleared as the recommended solution for that.

Thanks for any help!

-Mike French

mdfrenchman · ‎01-16-2020

A reboot of the cluster has resolved the issue. Further investigation returned this:
ERROR LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.

What MAY have caused that....
There were 2 sessions, one adding labels and one removing a different label from the same subset of nodes. That may have caused lock contention.

jeremie · ‎01-16-2020

Hello Mike,

Which version are you using.
Are you able to reproduce it?

mdfrenchman · ‎01-16-2020

enterprise version 3.5.5 and we haven't been able to reproduce it yet.

jeremie · ‎01-18-2020

Can you try to update to the latest maintenance release available?
Did you notice some GC pause on any node of the cluster ?
you should have messages like this in in debug.log Detected VM stop-the-world
as mentionned in Fatal error occurred when handling a client connection causes crash
or
in neo4j.log : java.lang.OutOfMemoryError: Java heap space

Neo4j

Cluster not processing write cypher