Neo4j

jim_ruts · ‎06-23-2022

Hi everyone.

We are using Neo4J for a while now in our company, but we are struggling with some of our queries (mostly performance issues on counts)

I would like to explain what it is we are trying to achieve, and hopefully someone can help us figuring it out.

The Setting
We have roughly 2 million :Product nodes with many relationships to other nodes like :Retailer, :Universe,...
e.g:
(:Product)-[:HAS_RETAILER]->(:Retailer)
(:Product)-[:HAS_UNIVERSE]->(:Universe)
...

All nodes have at least one index (usually on an id field.
Also, a product can have multiple retailer links and multiple universe links.

Note: These 2 relationships are just a few of many

The Problem
Whenever we try to filter our :Product by a single relationship, it's really fast.

// Note: the real query has the variables as query parameters
// not hard coded like this
match (p:Product)-[:HAS_RETAILER]->(r:Retailer)
where r.id in ["id1", "id2", ...]
return count(distinct p)

returning a count on the distinct products takes a few 100 milliseconds.
Executing the plan shows us the proper index is hit and there is no issue with this whatsoever.

The problem however is trying to filter the products further, lets say by :Retailer and :Universe
Below you can find an example of a query we try to run

// Note: the real query has the variables as query parameters
// not hard coded like this
match (p:Product)-[:HAS_RETAILER]->(r:Retailer)
where r.id in ["id1", "id2", ...]
with distinct p
match (p)-[:HAS_UNIVERSE]->(u:Universe) 
where u.id in ["1", "2", ...]
return count(distinct p)

This query returns a lot slower (a few seconds).
Investigating the plan it seems like no indexes are hit anymore the moment you reach a subsequent filter.
If the first filter returns 500.000 products, it seems the subsequent filter would have to inspect all of them to filter it down.

I believe we must be doing something wrong here but I cannot find a solution how to handle a situation like this and make the counts fast.

If we return the products itself, not the count. and limit the result then it is still really fast. However when paginating using skip and limit the results come in slower and slower the higher the skip is. Comparable with the response time of a count.

I am hoping someone has a solution for this or a good pattern to deal with these kinds of filtering.

Kind regards

bennu_neo · ‎06-23-2022

Hi @jim_ruts !

Before getting some fun, may I ask what Neo4J version are you using?

Oh, y’all wanted a twist, ey?

jim_ruts · ‎06-23-2022

Hey @bennu_neo ,

thanks for replying.

We are using version 4.4 (we use Aura Professional (8GB memory, 2 CPU, 16GB Storage) for 3 environments.)

Neo4j

Slow subsequent filtering query