cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Poor performance of the Query despite far less number of DB hits

Hello Graph Community,


I have been recently working on tuning the performance of a full-text search query written in Cypher. Since the query contained text conditions such as CONTAINS, it is suggested to use a FULLTEXT index to speed up the performance of the query. Following the same, query written using FULLTEXT index, when profiled, showed far fewer DB hits than what was there in the original query I wrote. This query however could stand the performance test and performed 5 times slower than the original query.


Original Cypher Query

MATCH path = (a:TOP_LEVEL)-[:Child *1..]->(q)-[:Child *0..]->(d)
WHERE q.name CONTAINS 'A1' AND NOT EXISTS {
  MATCH (p)-[:Child*]->(q)-[:Child*]->(r:LEAF)
  WHERE p.name CONTAINS $text AND
    $project_id in r.project_id
}
RETURN path

This query above when profiled showed around 13K DB hits.

New Query

Now I went further and wrote another query using FULLTEXT indexes. The Cypher query generated was as follows -


CALL db.index.fulltext.queryNodes('full_text_idx', '*A1*') YIELD node
WITH node as q
MATCH path = (a:TOP_LEVEL)-[:Child*]->(q)-[:Child*0..]->(d:LEAF)
WHERE EXISTS { 
MATCH (q)-[r:Child]->() 
WHERE $project_id in r.project_ids} OR (('LEAF' IN labels(q) OR 'ROLLUP' IN labels(q)) AND $project_id in q.project_id)
RETURN path

The new query took around 700 DB hits. This provided strong evidence that the new query should be faster than the old query by manifolds.


Results

When tested the first query easily outperformed the new query. While the new query took 90ms-100ms on average, the old query was able to give the same results in 20ms-30ms.
I am not able to figure out why this can happen.

1 REPLY 1

Best I can tell, the two queries as pasted here won't return the same result set. An apples to apples comparison would be useful to do first.

For example, the second query doesn't even do this test " WHERE p.name CONTAINS $text", and has other new conditions.

Have you used PROFILE on the queries to look at the row count difference? dbhits is interesting but row counts processed may be more important.