cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Cypher: Alias length >4 chars makes query 25x slower

  1. I discovered today that alias length can affect query in a huge way. I'm doing a count query with some date filters, with indexes on date property (NodeIndexSeekByRange). Profiling this query i discovered that using an alias with 4 or less characters makes the query 25x faster than using alias with more than 4 characters. The NodeIndexSeekByRange went from ~100 000 db hits to ~2700 db hits and total query db hits went from ~500 000 to ~21 000 by changing alias from 5 characters to 4 characters.
    (these are not my actual queries, but to illustrate my point)
    MATCH (aaaaa:Node)
    WHERE aaaaa.date < datetime($someDate) < aaaaa.date
    RETURN COUNT(*) AS count
    
    = 500 000 total db hits
    
    
    MATCH (aaaa:Node)
    WHERE aaaa.date < datetime($someDate) < aaaa.date
    RETURN COUNT(*) AS count
    
    = 21 000 total db hits
    So apparently alias length can affect query planner in a huge way, is this covered in the docs?
  2. This is neo4j 4.4 with BTREE index btw
  3. Would love some under the hood explanation of this, I'm assuming this goes down to byte level
3 REPLIES 3

Bertel
Node

After some more digging it seems that this is related to predicate order in the where statement.

My original query was something like this:

MATCH (n:Node)
WHERE (datetime($date1) <= n.date1 <= datetime($date2) 
OR datetime($date1) <= n.date2 <= datetime($date2)) 
OR (datetime($date1) > n.date1 AND datetime($date2) < n.date2)
RETURN COUNT(*) AS count

Changing the predicate order to this seems to have fixed the problem.

MATCH (n:Node)
WHERE (datetime($date1) > n.date1 AND datetime($date2) < n.date2) 
OR (datetime($date1) <= n.date1 <= datetime($date2) 
OR datetime($date1) <= n.date2 <= datetime($date2))
RETURN COUNT(*) AS count

So predicate order is really important for large count queries, though it is still strange that changing the alias would change the query planners interpretation of predicate order.

Bertel
Node

I am having issues again with this query

 

MATCH (n:Node)
WHERE (datetime($date1) > n.date1 AND datetime($date2) < n.date2) 
OR (datetime($date1) <= n.date1 <= datetime($date2) 
OR datetime($date1) <= n.date2 <= datetime($date2))
RETURN COUNT(*) AS count

 

I have a b-tree index on properties n.date1 and n.date2 (both separate and composite).

In this predicate the query planner will sometimes seek n.date1 index first, and sometimes seek n.date2 index first. 

 

(datetime($date1) > n.date1 AND datetime($date2) < n.date2) 

 

seeking n.date1 index first has huge performance loss vs seeking n.date2 index first, and i can't understand how the query planner decides what to use here, and how can i force it to use n.date2 index first? I have tried query hints like below, but with it still sometimes chooses to use n.date1 index first. Does creation order of index affect how the planner decides to use them?

 

MATCH (n:Node)
USING INDEX n:Node(date2)
USING INDEX n:Node(date1)
WHERE (datetime($date1) > n.date1 AND datetime($date2) < n.date2) 
OR (datetime($date1) <= n.date1 <= datetime($date2) 
OR datetime($date1) <= n.date2 <= datetime($date2))
RETURN COUNT(*) AS count

 

My main questions:
How does the query planner decide which index to seek first?
How can i force it to seek n.date2 index before n.date1 index?

 

 

Bertel
Node

Changing predicate order again seems to fix the problem so that n.date2 index is seeked first. Though this query feels unreliable to me, since I don't really understand the query planner decisions, and can't find any docs on it. 

 

MATCH (n:Node)
WHERE (datetime($date2) < n.date2 AND datetime($date1) > n.date1) 
OR (datetime($date1) <= n.date2 <= datetime($date2)
OR datetime($date1) <= n.date1 <= datetime($date2))
RETURN COUNT(*) AS count

 

Also the indexes must be created in order such that the n.date2 index id > n.date1 index id