String comparison performance

quan_nguyen
Node Clone

Hi all,

I am wondering about the performance of the following string comparison in where clause:
Case 1:

MATCH (p: Person)
WHERE NOT p.name='Person 1'
AND NOT p.name='Person 2'
RETURN p

Case 2:

MATCH (p: Person)
WHERE NOT (p.name='Person 1' OR p.name='Person 2')
RETURN p

Case 3:

MATCH (p: Person)
WHERE NOT p.name IN ['Person 1', 'Person 2']
RETURN p

I have tried on my database but the result is not united, sometimes case 3 gets be best response time and db hits, sometimes the best is case 1 and case 2.
Can you please make this clear for me?

1 ACCEPTED SOLUTION

Hi @quan.nguyen,

Have you tried using the keyword EXPLAIN and PROFILE on before your query?

  • EXPLAIN will not run the query, but show you the predicted cost of the query
  • PROFILE will run the query and show the cost of the query

In the your case, the Case 1 & Case 2 should perform with around the same cost. Basically it would scan all node label Person two times, one time for 'Person 1' the second time for 'Person 2'.

In Case 3 it will only scan all the node label Person once, checking both string 'Person 1' and 'Person 2' in one go.

The query Case 3 with the IN keyword would perform even better in comparison with the other two when you actually have more strings to compare (e.g p.name IN ['Person 1', 'Person 2, 'Person 3']

Hope this helps you with your decision!

View solution in original post

4 REPLIES 4

12kunal34
Graph Fellow

Hi @quan.nguyen

in this case i have one suggestion for you for the best performance .

if you want to searchinf for node on the basis of name then you should create index on this property
create index on :node(name)

this would resolve all your issues regarding db hits

please let me know if any other info required

Cheers

Thank you for your reply, @12kunal34

My case is more complicated and this is my example for the comparison. I have to consider this because my the property, that I need to filter out, is rarely used and it's not so selective (it has less then 10 different values).

Do you have any suggestions for this?

Hi @quan.nguyen,

Have you tried using the keyword EXPLAIN and PROFILE on before your query?

  • EXPLAIN will not run the query, but show you the predicted cost of the query
  • PROFILE will run the query and show the cost of the query

In the your case, the Case 1 & Case 2 should perform with around the same cost. Basically it would scan all node label Person two times, one time for 'Person 1' the second time for 'Person 2'.

In Case 3 it will only scan all the node label Person once, checking both string 'Person 1' and 'Person 2' in one go.

The query Case 3 with the IN keyword would perform even better in comparison with the other two when you actually have more strings to compare (e.g p.name IN ['Person 1', 'Person 2, 'Person 3']

Hope this helps you with your decision!