Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-23-2022 02:23 AM - edited 06-23-2022 02:26 AM
Hi everyone.
We are using Neo4J for a while now in our company, but we are struggling with some of our queries (mostly performance issues on counts)
I would like to explain what it is we are trying to achieve, and hopefully someone can help us figuring it out.
The Setting
We have roughly 2 million :Product nodes with many relationships to other nodes like :Retailer, :Universe,...
e.g:
(:Product)-[:HAS_RETAILER]->(:Retailer)
(:Product)-[:HAS_UNIVERSE]->(:Universe)
...
All nodes have at least one index (usually on an id field.
Also, a product can have multiple retailer links and multiple universe links.
Note: These 2 relationships are just a few of many
The Problem
Whenever we try to filter our :Product by a single relationship, it's really fast.
// Note: the real query has the variables as query parameters
// not hard coded like this
match (p:Product)-[:HAS_RETAILER]->(r:Retailer)
where r.id in ["id1", "id2", ...]
return count(distinct p)
returning a count on the distinct products takes a few 100 milliseconds.
Executing the plan shows us the proper index is hit and there is no issue with this whatsoever.
The problem however is trying to filter the products further, lets say by :Retailer and :Universe
Below you can find an example of a query we try to run
// Note: the real query has the variables as query parameters
// not hard coded like this
match (p:Product)-[:HAS_RETAILER]->(r:Retailer)
where r.id in ["id1", "id2", ...]
with distinct p
match (p)-[:HAS_UNIVERSE]->(u:Universe)
where u.id in ["1", "2", ...]
return count(distinct p)
This query returns a lot slower (a few seconds).
Investigating the plan it seems like no indexes are hit anymore the moment you reach a subsequent filter.
If the first filter returns 500.000 products, it seems the subsequent filter would have to inspect all of them to filter it down.
I believe we must be doing something wrong here but I cannot find a solution how to handle a situation like this and make the counts fast.
If we return the products itself, not the count. and limit the result then it is still really fast. However when paginating using skip and limit the results come in slower and slower the higher the skip is. Comparable with the response time of a count.
I am hoping someone has a solution for this or a good pattern to deal with these kinds of filtering.
Kind regards
06-23-2022 02:51 AM
Hi @jim_ruts !
Before getting some fun, may I ask what Neo4J version are you using?
06-23-2022 03:18 AM - edited 06-23-2022 03:41 AM
Hey @bennu_neo ,
thanks for replying.
We are using version 4.4 (we use Aura Professional (8GB memory, 2 CPU, 16GB Storage) for 3 environments.)
All the sessions of the conference are now available online