Neo4j

MikerT86 · ‎09-18-2022

Hi all.

I'm using Neo4j 4.4.4 Community Edition with docker and rolled up in k8s. The app designed in Python.

The resources reserved for pod: 80 GB RAM, 10 CPUs, 80GB ephemeral-storage. Neo4j.conf: 46GB page cache and automatic allocated heap size.

The problem is following: When DB is started up it consumes around 28GB, with workload the memory starts leaking up to reserved for pod memory limits. Should I strictly follow suggestions from neo4j-admin memrec?

I'm using HTTP API for transactions with committing. Which gave us pretty good performance in terms of sending a large number of small simultaneous requests. The query basically the same, but input data (starting nodes) constantly changing.

There is example of query:

MATCH (source_node: Person) WHERE source_node.name in $inputs
MATCH (source_node)-[r]->(child_id:InternalId)
WHERE r.valid_from <= datetime($actualdate) < r.valid_to
WITH [type(r), toString(date(r.valid_from)), child_id.id] as child_path, child_id, false as filtered
OPTIONAL MATCH p_path = (child_id)-[:HAS_PARENT_ID*0..50]->(parent_id:InternalId)
    WHERE all(a in relationships(p_path) WHERE a.valid_from <= datetime($actualdate) < a.valid_to) AND 
        NOT EXISTS{ MATCH (parent_id)-[q:HAS_PARENT_ID]->() WHERE q.valid_from <= datetime($actualdate) < q.valid_to}
    WITH DISTINCT last(nodes(p_path)) as i_source,
    reduce(st = [], q IN relationships(p_path) | st + [type(q), toString(date(q.valid_from)), endNode(q).id])
    as parent_path, CASE WHEN length(p_path) = 0 THEN NULL ELSE parent_id END as parent_id, child_path

    OPTIONAL MATCH (i_source)-[r:HAS_ISSUER_ID]->(issuer_id:IssuerId)
    WHERE r.valid_from <= datetime($actualdate) < r.valid_to
    RETURN DISTINCT CASE issuer_id WHEN NULL THEN child_path + parent_path + [type(r), NULL, "NOT FOUND IN RELATION"]
    ELSE child_path + parent_path + [type(r), toString(date(r.valid_from)), toInteger(issuer_id.id)]
    END as full_path, issuer_id, CASE issuer_id WHEN NULL THEN true ELSE false END as filtered

And the request example:

result = requests.post(
"http://neo4j.hostname.com:7474/db/neo4j/tx/commit",
json:json_data,
headers:headers
).json()

When the memory consuming face the limits, the performance rapidly drops.

1. Please, could you explain why exactly this happening and how to avoid the memory leaking? Does the performance drops because of GC?

2. Can use Python Driver instead the HTTP API with transaction committing option?

MikerT86 · ‎09-19-2022

Additionally, would like to ask, when we initialize page_cache and heap_size parameters, why the memory consumption still constantly growing with running transaction? For example: we have 80GB RAM, page_cache 46G and heap size 23G. DB starts with around 68GB RAM consumed. Which is ok, but when transaction starts the memory consumption immediately starts to grow over 70GB...

Neo4j

Transactions Committing and Memory Management