Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-23-2021 11:49 AM
Hi,
I upgraded my Neo4j docker container from 4.2.4
to 4.4.2
today,
and I'm facing some huge performance issues when querying a simple variable length path traversal.
A query like the following:
MATCH (t:Tree)-[:HAS_CHILD_BLOB|HAS_CHILD_TREE*]->(b:Blob)
WHERE t.hash = '51ac215688f1b1713e9a7992292e75f86defd5c9'
RETURN count(b)
This query takes around 200 seconds
to execute, for a count of 300 blobs btw ...
I don't recall having so much performance issues with the previous version.
When I ssh on my instance, I see the CPU spike at 100% all the time when the query is being executed.
Is that expected ?
Here is the profile of the query:
Cypher version: CYPHER 4.4, planner: COST, runtime: INTERPRETED. 353729021 total db hits in 201249 ms.
All I want to do is to reach all the children of this Tree
, recursively.
And for information, the maximum path length is ... 11
:
Any tips ?
Something wrong with my query, or the planner here ?
I just tried to run the same query (on a different data set), on AuraDB:
Started streaming 1 records in less than 1 ms and completed after 94 ms.
It works fine.
But the query planner is different:
What's happening on my docker instance ??
Thanks !
01-01-2022 06:20 AM
01-02-2022 12:43 PM
Hi @dana.canzano and thank you very much for your answer !
As I'm running the community edition where the performance issue is triggered, I can't test your query tuning trick, since it's limited on the Enterprise edition only.
01-04-2022 05:57 PM
This looks like some buggy planner behavior. Can you let us know how many :Tree nodes are in your db, how many :Blob nodes, how many :HAS_CHILD_BLOB relationships, and how many :HAS_CHILD_TREE relationships?
In the meantime, see if this performs better.
MATCH (t:Tree)-[:HAS_CHILD_BLOB|HAS_CHILD_TREE*]->(b)
WHERE t.hash = '51ac215688f1b1713e9a7992292e75f86defd5c9'
WITH t, b, 1 as ignored
WHERE b:Blob
RETURN count(b)
We want to see a plan like the second one you posted, that only expands from your starting t node and doesn't perform any label scans or cartesian product operations.
All the sessions of the conference are now available online