Neo4j

chris3 · ‎06-24-2019

Hello everyone!

I have been using Neo4j for a while now but just realized that I have a performance problem with the following query (simplified):

MATCH (u:User)<-[:U]-(:Profits)-[:DP]->(dp:DailyProfit) WHERE ID(u)=12345 AND dp.localDate>='20190601' AND dp.localDate<='20190624' RETURN dp

(FYI Result = only 24 DailyProfit nodes, one per day)

Instead of using the index on the ID, getting about 500 dp nodes and then filtering those on the localDate property, the query planner uses a NodeIndexSeekByRange on all of the 500k (!) DailyProfit nodes in the graph ... which takes about 1.5 seconds (!)

BTW the query with just the localDate>=x (and not a range >= and <=) takes 18ms (milli seconds that is!) because it does exactly what I would expect: Start with a single User node and filter the ~500 DailyProfit nodes...

Can I tell the query planner to start with the index on the ID of the user and filter the very small result with a range filter somehow?

Or any other suggestions to optimize this so that the planner doesn't always start with the range seek just because it thinks that is the fastest way (because of course it isn't since there are about 500 DailyProfits nodes per user but 500k in the entire DB)?

THX!

andrew_bowman · ‎06-24-2019

So this is a frustrating thing, the planner REALLY wants to use that index seek by range. Even if I massage the query to perform that lookup by id, it still wants to use an index seek by range and use a hash join between the results.

I'll raise this with the Cypher team, but in the meantime we can throw something into the query which will cause the planner to only use the localDate predicates as a filter and not for lookup. Try this out, verify it with EXPLAIN:

MATCH (u:User)<-[:U]-(:Profits)-[:DP]->(dp:DailyProfit) 
WHERE ID(u)=12345 
WITH u, dp, true as ignored // introducing a new variable to throw off the planner
WHERE dp.localDate>='20190601' AND dp.localDate<='20190624' 
RETURN dp

View solution in original post

andrew_bowman · ‎06-24-2019

So this is a frustrating thing, the planner REALLY wants to use that index seek by range. Even if I massage the query to perform that lookup by id, it still wants to use an index seek by range and use a hash join between the results.

I'll raise this with the Cypher team, but in the meantime we can throw something into the query which will cause the planner to only use the localDate predicates as a filter and not for lookup. Try this out, verify it with EXPLAIN:

MATCH (u:User)<-[:U]-(:Profits)-[:DP]->(dp:DailyProfit) 
WHERE ID(u)=12345 
WITH u, dp, true as ignored // introducing a new variable to throw off the planner
WHERE dp.localDate>='20190601' AND dp.localDate<='20190624' 
RETURN dp

chris3 · ‎06-24-2019

Hey Andrew,
I have tried all sorts of stuff ... but your solution is genius AND it works!
THX!!!
Cheers, Chris

PS: I was thinking along the lines of USING INDEX u:User(*ID*) ... or NOT USING INDEX dp:DailyProfit(localDate) ... or some way of telling the planner where to start since I know that it would be smarter to start at the user-index...

andrew_bowman · ‎06-24-2019

By default the planner SHOULD be preferring the graph id lookup (it isn't an index lookup so we can't use a hint for it) over all the others.

And in the case where I could get it to use the graph id lookup, it just couldn't let go of the index range scan on the others, so it would still be inefficient, so I had to resort to trickery.

I've raised the suggestion with our Cypher team on introducing a new planner hint to disable a lookup of any type on a node, so only expansions to that node would be possible. We'll see if that gets picked up later.

Neo4j

How to NOT use an index?