cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to Optimize a Query with Huge Where Clause

Hey, guys!

I have a question regarding the optimization of the following query.

MATCH (u:User { userId:"dOnBdddDpwcma3xlCje1"})-[:SPONSORED*..3]->(child:User)
WHERE child.achievedRank < 1000
  AND child.isMvp = True 
  AND child.userId IN ["d2JuOnwwwqsdXZpdDpwcmasddxlOjE0MD444Tkz", "dXJ33uByZ4dDpwcm9maWaalOjE0E1", "1XDdduOnBydadsq21dDpwcm9maWxlOjDA0NTk1"]
RETURN child.userId

The problem is with the following line:

AND child.userId IN ["d2JuOnwwwqsdXZpdDpwcmasddxlOjE0MD444Tkz", "dXJ33uByZ4dDpwcm9maWaalOjE0E1", "1XDdduOnBydadsq21dDpwcm9maWxlOjDA0NTk1"]

I want to filter with a list of users. But when i put a lot of userIds in the IN clause it is extremely slow.
In my case i need to use a filter of about 10000 users but it takes like 15-20 seconds.
I would really appreciate any suggestion that would help me optimize this query.

The Neo4j used is version 3.4.0 community edition

Here is the Execution Plan when trying to filter with about 10k userIds

7 REPLIES 7

I am fairly new to Neo4J and can't claim I know the internals, but what you are doing here is a lot of string comparisons which is not very efficient. If you can hash the user ids in advance (i.e. the list of ids in the WHERE ... IN... construction does not change every time) then it would be a constant time lookup for each uid.

Thanks for suggesting that, Ukirik! If i understood properly, your solution would work if the list does not change. The list of usersId changes, it is not the same every time. Correct me if I am wrong.

Hi,

Are you created a single-property or composite index?

Single-property index

CREATE INDEX ON :User(userId);

Composite index

CREATE INDEX ON :User(userId, achievedRank, isMvp);

It is a single property index

CREATE INDEX ON :User(userId);

Although creating a composite index would help a little, it will not solve the bigger problem, which is filtering by 10k users. Do you have any suggestions about that?

Hi @stanimir.venkov,
Can you try this?

PROFILE
WITH ["d2JuOnwwwqsdXZpdDpwcmasddxlOjE0MD444Tkz", "dXJ33uByZ4dDpwcm9maWaalOjE0E1", "1XDdduOnBydadsq21dDpwcm9maWxlOjDA0NTk1"] AS childList
MATCH (u:User { userId:"dOnBdddDpwcma3xlCje1"})-[:SPONSORED*..3]->(child:User{isMvp:True)
WITH child, childList
WHERE child.achievedRank < 1000
  AND child.userId IN childList
RETURN child.userId

I changed some things by removing the filter for isMvp and getting all childs that are sponsored - 2000000 levels downstairs so i would receive more result to see exactly is it faster or not.

MATCH (u:User { userId:"headUserId1232131"})-[:SPONSORED*..2000000]->(child:User)
WHERE child.achievedRank < 1000
  AND child.userId IN ["id1","id2" ... "id1000"] //Here we have 1000 userIds
RETURN child.userId

Thats the profile:

Now the same query, but with your suggestion:

PROFILE
WITH ["id1","id2"..."id10000"] as childList // Here we have the same 1000 users from the upper query
MATCH (u:User { userId:"headUserId142111"})-[:SPONSORED*..2000000]->(child:User)
WITH child, childList
WHERE child.achievedRank < 1000
  AND child.userId IN childList
RETURN child.userId

The profile:

I don't see any improvement with the speed of the query, i`d say is pretty much the same if not even a little slower, but thanks for suggestion. Do you have any other idea?

Hello @stanimir.venkov

How many RAM are you using for your database?
Do you have a good processor?

Myabe you can try to increase the RAM if possible

Regards,
Cobra