Neo4j

mike_blum_neo4j · ‎02-12-2020

Looks like memrec in Neo4j 4.0 is behaving a bit erratically with an empty database:

./bin/neo4j-admin memrec
# Memory settings recommendation from neo4j-admin memrec:
#
# Assuming the system is dedicated to running Neo4j and has 3.646GiB of memory,
# we recommend a heap size of around 1800m, and a page cache of around 8m,
# and that about 1400m is left for the operating system, and the native memory
# needed by Lucene and Netty.
#
# Tip: If the indexing storage use is high, e.g. there are many indexes or most
# data indexed, then it might advantageous to leave more memory for the
# operating system.
#
# Tip: Depending on the workload type you may want to increase the amount
# of off-heap memory available for storing transaction state.
# For instance, in case of large write-intensive transactions
# increasing it can lower GC overhead and thus improve performance.
# On the other hand, if vast majority of transactions are small or read-only
# then you can decrease it and increase page cache instead.
#
# Tip: The more concurrent transactions your workload has and the more updates
# they do, the more heap memory you will need. However, don't allocate more
# than 31g of heap, since this will disable pointer compression, also known as
# "compressed oops", in the JVM and make less effective use of the heap.
#
# Tip: Setting the initial and the max heap size to the same value means the
# JVM will never need to change the heap size. Changing the heap size otherwise
# involves a full GC, which is desirable to avoid.
#
# Based on the above, the following memory settings are recommended:
dbms.memory.heap.initial_size=1800m
dbms.memory.heap.max_size=1800m
dbms.memory.pagecache.size=8m
dbms.tx_state.max_off_heap_memory=955700k
#
# The numbers below have been derived based on your current databases located at: '/var/data/neo4j/databases'.
# They can be used as an input into more detailed memory analysis.
# Total size of lucene indexes in all databases: 0k
# Total size of data and native indexes in all databases: 424k

calling out specifically dbms.memory.pagecache.size=8m - shouldn't this be a factor of the heap size of 1800m? This is based of a 4GB c5.large EC2.

andrew_bowman · ‎02-12-2020

I agree the pagecache size here seems far too low, I think we may need to establish a minimum to recommend even in the cases of a small db. It's highly unlikely it would stay that small. That said, you only have 3.5 gb of memory or so on that system, so you don't have much to spare. Unless you are only going to be working with small graphs, you may want to consider upgrading to an instance with more memory.

Pagecache sizes do not have to be a factor of the heap size, they are mostly unrelated. While the heap is basically a scratch area in memory for query execution, the pagecache is more correlated with the db size. Having as much of the graph in the pagecache is a means to minimize disk I/O. With insufficient pagecache coverage of the database, you won't encounter memory errors, but may end up with inefficient querying due to pagecache misses and needing to go to disk.

View solution in original post

andrew_bowman · ‎02-12-2020

I agree the pagecache size here seems far too low, I think we may need to establish a minimum to recommend even in the cases of a small db. It's highly unlikely it would stay that small. That said, you only have 3.5 gb of memory or so on that system, so you don't have much to spare. Unless you are only going to be working with small graphs, you may want to consider upgrading to an instance with more memory.

Pagecache sizes do not have to be a factor of the heap size, they are mostly unrelated. While the heap is basically a scratch area in memory for query execution, the pagecache is more correlated with the db size. Having as much of the graph in the pagecache is a means to minimize disk I/O. With insufficient pagecache coverage of the database, you won't encounter memory errors, but may end up with inefficient querying due to pagecache misses and needing to go to disk.

mike_blum_neo4j · ‎02-12-2020

Agreed - historically our graphs have only been a few 100 MB in size but with multi-tenancy we'll be re-evaluating what size EC2 we'll be using. For the time being we've been passing in a --memory=nG where n is the base memory of the EC2 x an offset for more aggressive memory usage / heap allocation since Neo4j is the only service on this box.

Neo4j

Neo4j 4.0 Memrec