Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-04-2019 03:30 PM
I recently upgrade our Neo4j Enterprise to v3.5.9. The problem is that since the DB started, any write query hangs for 60seconds and the query is killed by the runtime (We have dbms.transaction.timeout=60s in the config)
Even this simple query will never complete:
create (:Foo)
Or one with a SET statament:
match(n:Node)
where n.uuid = $uuid
set n.updateDate = "2019-09-04T22:27:31.516Z"
On the other hand, read queries are fine. Only writes are hanging.
I see a lot of lines like this on my debug.log:
09-05-2019 11:48 AM
Please attach a snippet (as text) of your debug.log
showing a startup sequence
09-05-2019 12:32 PM
How do you execute those write queries?
can you call dbms.listQueries()
when they hang and share the output?
09-05-2019 01:21 PM
I ran dbms.listQueries right after create (:Foo)
and I had this output.
queryId,username,metaData,query,parameters,planner,runtime,indexes,startTime,protocol,clientAddress,requestUri,status,resourceInformation,activeLockCount,elapsedTimeMillis,cpuTimeMillis,waitTimeMillis,idleTimeMillis,allocatedBytes,pageHits,pageFaults,connectionId
query-15000,neo4j,"{type:user-direct,app:neo4j-browser_v3.2.20}",call dbms.listQueries(),{},procedure,procedure,[],2019-09-05T20:20:11.987Z,bolt,190.92.63.238:58578,172.31.24.237:7687,running,{},0,0,null,0,null,null,0,0,bolt-275
09-05-2019 01:15 PM
Sure. I'll share my neo4j.log as well.
This is my neo4j,log after starting and long time after the query was executed
2019-09-05 02:58:37.995+0000 INFO ======== Neo4j 3.5.9 ========
2019-09-05 02:58:38.006+0000 INFO Starting...
2019-09-05 02:58:39.487+0000 INFO Initiating metrics...
2019-09-05 02:58:43.321+0000 INFO GraphAware Runtime enabled, bootstrapping...
2019-09-05 02:58:43.357+0000 INFO Bootstrapping module with order 1, ID UIDM, using com.graphaware.module.uuid.UuidBootstrapper
2019-09-05 02:58:43.544+0000 INFO Relationship Inclusion Policy set to com.graphaware.common.policy.inclusion.composite.CompositeRelationshipInclusionPolicy@e12e95a1
2019-09-05 02:58:43.554+0000 INFO Registering module UIDM with GraphAware Runtime.
2019-09-05 02:58:43.556+0000 INFO GraphAware Runtime bootstrapped, starting the Runtime...
2019-09-05 02:58:43.724+0000 INFO Deprecated index providers in use: lucene-1.0 (34 indexes). Use procedure 'db.indexes()' to see what indexes use which index provider.
2019-09-05 02:58:44.008+0000 INFO Starting GraphAware...
2019-09-05 02:58:44.015+0000 INFO Loading module metadata...
2019-09-05 02:58:44.015+0000 INFO Loading metadata for module UIDM
2019-09-05 02:58:44.131+0000 INFO Module UIDM seems to have been registered for the first time.
2019-09-05 02:58:44.131+0000 INFO Module UIDM seems to have been registered for the first time, will try to initialize...
2019-09-05 02:58:44.135+0000 INFO InitializeUntil set to 9223372036854775807 and it is 1567652324135. Will initialize.
2019-09-05 02:59:05.875+0000 INFO Sending metrics to CSV file at /var/lib/neo4j/metrics
2019-09-05 02:59:07.031+0000 INFO Bolt enabled on 0.0.0.0:7687.
2019-09-05 02:59:08.909+0000 WARN Server thread metrics not available (missing neo4j.server.threads.jetty.all)
2019-09-05 02:59:08.911+0000 WARN Server thread metrics not available (missing neo4j.server.threads.jetty.idle)
2019-09-05 02:59:12.693+0000 INFO Started.
2019-09-05 02:59:13.611+0000 INFO Mounted REST API at: /db/manage
2019-09-05 02:59:13.665+0000 INFO started
2019-09-05 02:59:13.678+0000 INFO Mounted unmanaged extension [com.graphaware.server] at [/graphaware]
2019-09-05 02:59:13.890+0000 INFO Server thread metrics have been registered successfully
2019-09-05 02:59:14.300+0000 INFO Google Analytics enabled
2019-09-05 03:29:21.328+0000 WARN Exception while producing input! Transaction was marked as successful, but unable to commit transaction so rolled back.
org.neo4j.graphdb.TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
at org.neo4j.kernel.impl.coreapi.TopLevelTransaction.close(TopLevelTransaction.java:99)
at com.graphaware.tx.executor.input.TransactionalInput.closeTx(TransactionalInput.java:118)
at com.graphaware.tx.executor.input.TransactionalInput.fetchNextOrNull(TransactionalInput.java:79)
at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
at com.graphaware.tx.executor.batch.IterableInputBatchTransactionExecutor.lambda$populateQueue$0(IterableInputBatchTransactionExecutor.java:82)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.neo4j.graphdb.TransactionTerminatedException: The transaction has been terminated. Retry your operation in a new transaction, and you should see a successful result. The transaction has not completed within the specified timeout (dbms.transaction.timeout). You may want to retry with a longer timeout.
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.failOnNonExplicitRollbackIfNeeded(KernelTransactionImplementation.java:639)
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.closeTransaction(KernelTransactionImplementation.java:589)
at org.neo4j.internal.kernel.api.Transaction.close(Transaction.java:178)
at org.neo4j.kernel.impl.coreapi.TopLevelTransaction.close(TopLevelTransaction.java:77)
... 6 more
2019-09-05 03:37:56.266+0000 WARN The client is unauthorized due to authentication failure.
2019-09-05 05:43:22.664+0000 WARN Exception while producing input! Transaction was marked as successful, but unable to commit transaction so rolled back.
org.neo4j.graphdb.TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
at org.neo4j.kernel.impl.coreapi.TopLevelTransaction.close(TopLevelTransaction.java:99)
at com.graphaware.tx.executor.input.TransactionalInput.closeTx(TransactionalInput.java:118)
at com.graphaware.tx.executor.input.TransactionalInput.fetchNextOrNull(TransactionalInput.java:79)
at org.neo4j.helpers.collection.PrefetchingIterator.peek(PrefetchingIterator.java:60)
at org.neo4j.helpers.collection.PrefetchingIterator.hasNext(PrefetchingIterator.java:46)
at com.graphaware.tx.executor.batch.IterableInputBatchTransactionExecutor.lambda$populateQueue$0(IterableInputBatchTransactionExecutor.java:82)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.neo4j.graphdb.TransactionTerminatedException: The transaction has been terminated. Retry your operation in a new transaction, and you should see a successful result. The transaction has not completed within the specified timeout (dbms.transaction.timeout). You may want to retry with a longer timeout.
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.failOnNonExplicitRollbackIfNeeded(KernelTransactionImplementation.java:639)
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.closeTransaction(KernelTransactionImplementation.java:589)
at org.neo4j.internal.kernel.api.Transaction.close(Transaction.java:178)
at org.neo4j.kernel.impl.coreapi.TopLevelTransaction.close(TopLevelTransaction.java:77)
... 6 more
2019-09-05 05:43:31.787+0000 INFO Creating fresh metadata for module UIDM.
2019-09-05 05:43:31.787+0000 INFO Module UIDM has not changed configuration since last run, already initialized.
Exception in thread "GraphAware Starter" org.neo4j.graphdb.TransactionFailureException: Transaction was marked as successful, but unable to commit transaction so rolled back.
at org.neo4j.kernel.impl.coreapi.TopLevelTransaction.close(TopLevelTransaction.java:103)
at com.graphaware.runtime.metadata.GraphPropertiesMetadataRepository.persistModuleMetadata(GraphPropertiesMetadataRepository.java:111)
at com.graphaware.runtime.metadata.GraphPropertiesMetadataRepository.persistModuleMetadata(GraphPropertiesMetadataRepository.java:98)
at com.graphaware.runtime.manager.BaseModuleManager.persistMetadata(BaseModuleManager.java:217)
at com.graphaware.runtime.manager.BaseModuleManager.loadMetadata(BaseModuleManager.java:158)
at com.graphaware.runtime.manager.BaseModuleManager.loadMetadata(BaseModuleManager.java:125)
at com.graphaware.runtime.TxDrivenRuntime.loadMetadata(TxDrivenRuntime.java:130)
at com.graphaware.runtime.ProductionRuntime.loadMetadata(ProductionRuntime.java:80)
at com.graphaware.runtime.BaseGraphAwareRuntime.startModules(BaseGraphAwareRuntime.java:154)
at com.graphaware.runtime.TxDrivenRuntime.startModules(TxDrivenRuntime.java:146)
at com.graphaware.runtime.ProductionRuntime.startModules(ProductionRuntime.java:70)
at com.graphaware.runtime.BaseGraphAwareRuntime.start(BaseGraphAwareRuntime.java:134)
at com.graphaware.runtime.bootstrap.RuntimeKernelExtension.lambda$start$0(RuntimeKernelExtension.java:117)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.neo4j.kernel.impl.store.InvalidRecordException: Property[0,used=false,prev=-1,next=-1, (blocks not loaded)] not in use
at org.neo4j.kernel.impl.store.record.RecordLoad.verify(RecordLoad.java:63)
at org.neo4j.kernel.impl.store.CommonAbstractStore.verifyAfterReading(CommonAbstractStore.java:1238)
at org.neo4j.kernel.impl.store.CommonAbstractStore.readRecordFromPage(CommonAbstractStore.java:1107)
at org.neo4j.kernel.impl.store.CommonAbstractStore.readIntoRecord(CommonAbstractStore.java:1057)
at org.neo4j.kernel.impl.store.CommonAbstractStore.getRecord(CommonAbstractStore.java:1024)
at org.neo4j.kernel.impl.storageengine.impl.recordstorage.Loaders$2.load(Loaders.java:182)
at org.neo4j.kernel.impl.storageengine.impl.recordstorage.Loaders$2.load(Loaders.java:162)
at org.neo4j.kernel.impl.transaction.state.RecordChanges.getOrLoad(RecordChanges.java:69)
at org.neo4j.kernel.impl.storageengine.impl.recordstorage.PropertyCreator.primitiveSetProperty(PropertyCreator.java:84)
at org.neo4j.kernel.impl.storageengine.impl.recordstorage.TransactionRecordState.graphAddProperty(TransactionRecordState.java:531)
at org.neo4j.kernel.impl.storageengine.impl.recordstorage.TransactionToRecordStateVisitor.visitGraphPropertyChanges(TransactionToRecordStateVisitor.java:149)
at org.neo4j.storageengine.api.txstate.TxStateVisitor$Delegator.visitGraphPropertyChanges(TxStateVisitor.java:218)
at org.neo4j.kernel.impl.api.state.TxState.accept(TxState.java:165)
at org.neo4j.kernel.impl.storageengine.impl.recordstorage.RecordStorageEngine.createCommands(RecordStorageEngine.java:272)
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.commit(KernelTransactionImplementation.java:687)
at org.neo4j.kernel.impl.api.KernelTransactionImplementation.closeTransaction(KernelTransactionImplementation.java:594)
at org.neo4j.internal.kernel.api.Transaction.close(Transaction.java:178)
at org.neo4j.kernel.impl.coreapi.TopLevelTransaction.close(TopLevelTransaction.java:77)
... 13 more
This is my most recent debug.log. Its quite long so I put it on my Google Drive.
09-05-2019 01:23 PM
I think that Graphaware uuid plugin it's involve in this. I removed the plugin and its configs, then restarted the database and it works as usual. The problem is that we really depend on this plugin.
I am not sure if this is related but when I run sudo service neo4j stop
it timeout. I have to sudo kill -9 <pid>
to stop the database
09-05-2019 05:52 PM
using a kill -9 <neo4j pid>
is not recommended/encouraged. When you shut down Neo4j properly using a bin/neo4j stop
or similar this will cause Neo4j to exit gracefully and exit such that immediately before exiting it will checkpoint, i.e. write any updated data from the pagecache (i.e. the representation of graph data in RAM) to the actual files in disk at graph.db/neostore.*store.db.
If you kill -9 <neo4j pid>
this force kill has no opportunity to checkpoint and thus the updated data in the pagecache can not be written to graph.db/neostore.*store.db. However the data is not lost. Rather upon the next start, it detects that there are transactions in the transaction log which have not been commtted to graph.db/neostore.*store.db and this needs to commit these transactions. As such either you checkpoint via a graceful shutdown or you effectively have to checkpoint on the next startup.
09-05-2019 09:09 PM
Thanks @dana.canzano! Yeah, I've notice that the next time I start there are lines in the logs saying the database needs recovery. I use kill as my last option, only when I have a database that does not stop with the usual neo4j stop command.
09-06-2019 04:09 AM
APOC now also has some UUID support, perhaps that works better?
You can also use call dbms.listActiveLocks()
and call dbms.listTransactions()
for some more details on what happens when it hangs.
I think there might be some lock being held that keeps your query from progressing.
09-06-2019 09:48 AM
call dbms.listTransactions()
and call dbms.listActiveLocks()
did not provide useful information .
call dbms.listTransactions()
, shows three rows, one for the procedure and two almost empty rows. There is no queryId which I need in order to call dbms.listActiveLocks()
.
I studied the APOC support, I wish we could have a way to install handlers for all nodes and relationships at once. I understood we'd need to call apoc.uuid.install()
for each label.
09-06-2019 04:38 PM
So today I I tried with a fresh Neo4j. I configured the plugins and all good. Then I mounted a backup of my data (from the server that was failing) and the problem continued. That makes me think that there's something going on our data/store/indexes.
I know that Graphaware uuid uses explicit indexes, not sure if that could be involve.
I see a lot of this in my debug.log.
Memory?
2019-09-06 23:33:21.793+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=125, gcTime=129, gcCount=1}
2019-09-06 23:33:34.294+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=112, gcTime=116, gcCount=1}
2019-09-06 23:33:37.146+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=146, gcTime=149, gcCount=1}
2019-09-06 23:33:44.649+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=143, gcTime=146, gcCount=1}
2019-09-06 23:33:57.366+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=105, gcTime=137, gcCount=1}
2019-09-06 23:34:02.391+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=129, gcTime=146, gcCount=1}
2019-09-06 23:34:07.410+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=123, gcTime=124, gcCount=1}
2019-09-06 23:34:10.024+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=109, gcTime=129, gcCount=1}
I am not sure but I'll keep digging.
09-06-2019 09:48 PM
So, after startup I see two very long running transactions. They seem to be internal transactions because they do not have query-id. They have high values for page hits and page faults, the last I checked had 5810 page hits and 209 page faults.
I debugged the code for graphaware server and uuid and found that when the uuid module bootstraps it's calling all the nodes in the graph through a series of wrappers that finally invoke getAllNodes
method in GraphDatabaseService
class. This is where the process hangs. The result getAllNodes
is expected to execute a unit of work nothing is generated.
Now I increases the timeout for transactions, it used to be 1minute and now it's 1hour. I am crossing fingers to see it load successfully.
Here is where the uuid module initializes
Here it waits for the result of getAllNodes
. My logs point to the line 86 on this file.
09-09-2019 05:49 PM
@dmiranda2791 which version of Neo4j did you upgrade from? How many nodes do you have? And did they all have a uuid assigned before upgrade?
And finally, do you use Community or Enterprise?
09-09-2019 07:10 PM
Upgraded from 3.1.1, we have about 9 million nodes, they all had uuid assigned before and we are using enterprise edition.
I think it was an issue with the database store. We mounted a backup of our production store on the dev instance then we did the upgrade and all went good. Also the previous store use to take long times to startup.
Fortunately, we solved our problem .
09-09-2019 07:26 PM
Good to know, thanks
09-11-2019 09:41 AM
@Luanne_Misquitta, I am experiencing the same problem again on our dev DB , I am pretty sure it has something to do with the initialization of Graphaware uuid module. If I disable this module startup is fast and I can do write queries almost immediately.
When Uuid module is enable, Neo4j starts but I can't do any write queries. The last time I let it run for about 10 hours and I still couldn't.
I have a server with 30GB of ram, I have tested different heap and page cache configurations such as: 18GB for heap, 8 for page cache and the other way around 8GB for heap and 18 for page cache. My store in the version is about 13GB.
Each time I see the module initializing:
2019-09-11 16:19:51.064+0000 INFO Module UIDM seems to have been registered for the first time.
2019-09-11 16:19:51.064+0000 INFO Module UIDM seems to have been registered for the first time, will try to initialize...
And a lot of GC pauses on the debug.log
2019-09-11 16:33:11.966+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=109, gcTime=115, gcCount=1}
2019-09-11 16:34:57.481+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=104, gcTime=121, gcCount=1}
2019-09-11 16:35:31.614+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=108, gcTime=117, gcCount=1}
2019-09-11 16:36:10.861+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=115, gcTime=116, gcCount=1}
It'd be great if you can help me figure out what's going on.
09-10-2019 03:29 AM
do you by chance still have the old store around for forensics?
09-10-2019 10:55 AM
Yeah, I have an AWS image with the old version of the database before starting the upgrade. How would that forensics work be?
09-11-2019 03:37 PM
we would like to send it to engineering for analytics, if you can share it via DM with me.
04-24-2020 05:08 AM
I'm very disappointed with all these upgrades.
The new native relationship indexes are much slower than APOC, which has been deprecated.
Other (normal) requests that don't query relationship properties are also much slower:
All the sessions of the conference are now available online