Neo4j

p_xcx · ‎08-12-2022

Hi there,

I am currently working the first time with neo4j for my thesis. I am using Neo4j Community 4.4.10 and my server has 180 GB RAM. For my project I am using a dataset that is 200 GB big and nodes :e and their relationships.

I am using the cypher shell and the first thing I do is load the nodes into the RAM with:

CALL apoc.warmup.run(TRUE,TRUE,TRUE);

After that roughly 30GB RAM is being used. This takes at least four minutes.

Then I have two different queries I want to run.

The first one will collect up to 1000 paths starting at entity with "Q886" and is 'fast' with 2 sec. :

MATCH (n:e {nodeid: 'Q886'}) CALL apoc.path.expandConfig(n, {   minLevel: 1, maxLevel: 1}) YIELD path RETURN nodes(path) as nodes, relationships(path) as relations LIMIT 1000;

plan(7).png

If I don't Limit it to 1000 it will take longer because it will go through more nodes in the first step. Also some nodes take much longer.

The second query returns the number of outgoing relationships of a node. With the same node "Q886" this takes a minute.

plan(6)(1).png

Notably going through more nodes at the beginning.

So my question: What could cause this?

I am guessing it has to do with the RAM usage. Maybe I have to index all nodes that have a nodeid.

Thank you very much!

best regards from a frustrated student 🙂

Cobra · ‎08-12-2022

Hello @p_xcx 😊

Do you have a UNIQUE CONSTRAINT on your "e" label for the "nodeid" property?

CREATE CONSTRAINT constraint_e_node_id IF NOT EXISTS FOR (n:e) REQUIRE n.nodeid IS UNIQUE;

Moreover, what is your number of nodes/relationships in your database? Your database looks oversized.

Regards,
Cobra

glilienfield · ‎08-12-2022

You should definitely create an index as @Cobra suggested to replace the initial NodeByLabel scan with an indexed lookup. In addition, I think you can rewrite the query more efficiently. In the first query you are limiting your search to a min and max length equal to 1. This is equivalent to a direct relationship, allowing you to use a simple pattern match. From explain plan for your second query, it looks like you are matching on the same node and finding the number of relationships with the 'size' method (note, this usage of size has been deprecated). If my interpretation is correct, then you can combine the two queries into one, as follows:

MATCH (n:e{nodeid: 'Q886'})-[r]-(m)
WITH n, r, m
LIMIT 1000
WITH n, collect(m) as children, collect(r) as relations
RETURN n + children as nodes, relations, size(relations) as count

Do you really have more than 1000 nodes related to this single node? I believe @Cobra was addressing the same concern.

Neo4j

Why are my queries so slow..?