Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-05-2018 10:58 AM
If I am not wrong Neo4J do not keep 100% of total data in-memory graph but some portion of it based on what is actively used and required.
How much data Neo4J keeps in memory and what decides what data will be kept in Nodes /relationships in memory and what will be persisted to disks?
11-05-2018 11:02 AM
This page gives a breakdown of how memory management works in Neo4j.
https://neo4j.com/docs/operations-manual/current/performance/memory-configuration/
Basically, if your page cache is bigger than the store size on disk plus the indexes you're using, then your entire database is going to end up in memory once you start querying it and most of it becomes "hot" and gets loaded into the page cache. The page cache is used to cache the Neo4j data and native indexes. The caching of graph data and indexes into memory will help avoid costly disk access and result in optimal performance.
So how much of your graph is kept in memory is entirely configurable, and up to you to select. Ideally the entire graph is in memory and this yields best performance.
11-05-2018 11:36 AM
@david.allen Thanks. I got it now. Still, for big data sets , keeping all the data in-memory is not possible. So can we pre-define exactly what part of data we want always in-memory and what part of data can be kept on disks untill needed?
E.g Always keep Employee lable nodes in-memory pagecache , never on disk , but don't load say Address lable nodes in-memory untill required?
11-05-2018 12:19 PM
No. I don't believe that level of configurability is possible. The page cache tends to keep in memory the things that have been most recently used. You could probably load the page cache to have the contents you wanted by restarting the database and then issuing queries, for example with APOC warmup:
https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_warmup
But you can't keep those elements in memory. As queries come in, you'll need other bits of data, and stuff that is frequently used will displace things that aren't. This is actually a net positive for overall system performance.
All the sessions of the conference are now available online