Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-09-2020 05:26 AM
Hey, guys!
I have a question regarding the optimization of the following query.
MATCH (u:User { userId:"dOnBdddDpwcma3xlCje1"})-[:SPONSORED*..3]->(child:User)
WHERE child.achievedRank < 1000
AND child.isMvp = True
AND child.userId IN ["d2JuOnwwwqsdXZpdDpwcmasddxlOjE0MD444Tkz", "dXJ33uByZ4dDpwcm9maWaalOjE0E1", "1XDdduOnBydadsq21dDpwcm9maWxlOjDA0NTk1"]
RETURN child.userId
The problem is with the following line:
AND child.userId IN ["d2JuOnwwwqsdXZpdDpwcmasddxlOjE0MD444Tkz", "dXJ33uByZ4dDpwcm9maWaalOjE0E1", "1XDdduOnBydadsq21dDpwcm9maWxlOjDA0NTk1"]
I want to filter with a list of users. But when i put a lot of userIds in the IN clause it is extremely slow.
In my case i need to use a filter of about 10000 users but it takes like 15-20 seconds.
I would really appreciate any suggestion that would help me optimize this query.
The Neo4j used is version 3.4.0 community edition
Here is the Execution Plan when trying to filter with about 10k userIds
06-09-2020 06:04 AM
I am fairly new to Neo4J and can't claim I know the internals, but what you are doing here is a lot of string comparisons which is not very efficient. If you can hash the user ids in advance (i.e. the list of ids in the WHERE ... IN... construction does not change every time) then it would be a constant time lookup for each uid.
06-10-2020 05:09 AM
Thanks for suggesting that, Ukirik! If i understood properly, your solution would work if the list does not change. The list of usersId changes, it is not the same every time. Correct me if I am wrong.
06-09-2020 08:02 AM
Hi,
Are you created a single-property or composite index?
Single-property index
CREATE INDEX ON :User(userId);
Composite index
CREATE INDEX ON :User(userId, achievedRank, isMvp);
06-10-2020 05:04 AM
It is a single property index
CREATE INDEX ON :User(userId);
Although creating a composite index would help a little, it will not solve the bigger problem, which is filtering by 10k users. Do you have any suggestions about that?
06-11-2020 12:43 AM
Hi @stanimir.venkov,
Can you try this?
PROFILE
WITH ["d2JuOnwwwqsdXZpdDpwcmasddxlOjE0MD444Tkz", "dXJ33uByZ4dDpwcm9maWaalOjE0E1", "1XDdduOnBydadsq21dDpwcm9maWxlOjDA0NTk1"] AS childList
MATCH (u:User { userId:"dOnBdddDpwcma3xlCje1"})-[:SPONSORED*..3]->(child:User{isMvp:True)
WITH child, childList
WHERE child.achievedRank < 1000
AND child.userId IN childList
RETURN child.userId
06-16-2020 01:09 AM
I changed some things by removing the filter for isMvp and getting all childs that are sponsored - 2000000 levels downstairs so i would receive more result to see exactly is it faster or not.
MATCH (u:User { userId:"headUserId1232131"})-[:SPONSORED*..2000000]->(child:User)
WHERE child.achievedRank < 1000
AND child.userId IN ["id1","id2" ... "id1000"] //Here we have 1000 userIds
RETURN child.userId
Thats the profile:
Now the same query, but with your suggestion:
PROFILE
WITH ["id1","id2"..."id10000"] as childList // Here we have the same 1000 users from the upper query
MATCH (u:User { userId:"headUserId142111"})-[:SPONSORED*..2000000]->(child:User)
WITH child, childList
WHERE child.achievedRank < 1000
AND child.userId IN childList
RETURN child.userId
The profile:
I don't see any improvement with the speed of the query, i`d say is pretty much the same if not even a little slower, but thanks for suggestion. Do you have any other idea?
06-16-2020 03:49 AM
Hello @stanimir.venkov
How many RAM are you using for your database?
Do you have a good processor?
Myabe you can try to increase the RAM if possible
Regards,
Cobra
All the sessions of the conference are now available online