Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-09-2020 12:19 AM
Hello,
I always find myself confused when profiling queries and find that at some point the number of DB hits increase somewhere along the execution path.
Take into consideration the following query:
MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
WHERE t.iscore IS NOT NULL
AND t.iscore >= $iScoreFrom
AND t.iscore <= $iScoreTo
AND t.followers >= $followersFrom
AND t.followers <= $followersTo
AND t.avg_video_views >= $videoViewsFrom
AND t.avg_video_views <= $videoViewsTo
AND t.avg_engagement >= $avgEngagementFrom
AND t.avg_engagement <= $avgEngagementTo
AND r.affinity > 0 // <----
RETURN t AS talent, r AS belongsTo
ORDER BY r.affinity DESC
SKIP $offset LIMIT $limit;
Here is a screenshot of the query PROFILE
t
based on the above query params we still have s
and/or r
that spans beyond the matched t
pool?I think of a filter statement to have exclusive nature, in a sense that, once we've matched an edge of a relation, only the corresponding "neighbours" would be available for the rest of the query. Think of it as a funnel, the more we advance into the execution, in this case, the more pruning should be done along the way.
But as we can observe when we reach the AND r.affinity > 0
filter the pool of t
s increase drastically.
I'd like to add that this query is running on a large data set and it takes anywhere between 40s to 1min to execute. We suspect that the query's poor performance is specifically due to AND r.affinity > 0
.
Thoughts?
Thanks in advance.
07-09-2020 01:01 AM
Hi, perhaps this explanation helps.
Since you have an index on AudienceSegment.checksum, the "anchor" of your query, or the node that is found first is s
.
From there, it expands out through the BELONGS_TO
relationship type to find the t
s on the other end. This is the first Expand (All) you see in the plan.
There's a filter on the relationship r
and this checking incurs db hits as each relationship r
has to be checked for the property affinity
being greater than 0. At the end of the filter step, you see the number of rows has indeed reduced.
From that point on, there is no further expansion and just projections and sorting required for your results.
Depends on the goal of your query and the graph model, but the more BELONGS_TO relationships that exist between any given pair of nodes, the more your query is going to cost because of the relationship property check.
07-09-2020 01:25 AM
Hi Luanne,
Thanks for your prompt reply and explanation.
I would like to emphasis that we are also filtering t
s, I expect it to reduce the number of matched r
s.
Do you mean that the relation filter takes precedence over the nodes filter on the other end?
To avoid this kind of expansion I was considering splitting the WHERE
clause into two and pipe results from one to another, something of the sort:
MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
WHERE t.iscore IS NOT NULL
AND t.iscore >= $iScoreFrom
AND t.iscore <= $iScoreTo
AND t.followers >= $followersFrom
AND t.followers <= $followersTo
AND t.avg_video_views >= $videoViewsFrom
AND t.avg_video_views <= $videoViewsTo
AND t.avg_engagement >= $avgEngagementFrom
AND t.avg_engagement <= $avgEngagementTo
WITH s, r, t
WHERE r.affinity > 0 // <----
RETURN t AS talent, r AS belongsTo
ORDER BY r.affinity DESC
SKIP $offset LIMIT $limit;
But this had no effect on query performance.
Does this split make any sense in your opinion?
Thanks again.
07-09-2020 03:49 AM
No, moving the r.affinity
filter to after the WITH will not make a difference because the relationships have already been expanded into. If you know for a fact that it would be cheaper to expand from t
to s then perhaps you can play with some hints https://neo4j.com/docs/cypher-manual/current/query-tuning/using/ ?
07-10-2020 08:09 AM
Hi again,
Thanks Luanne for the follow up.
I've been experimenting a few but I am not confident I am heading in the right path, so i'd like to add some more context to our use case and share what i've been trying so far.
We have a fairly average size data set, ~700k nodes and ~100M relations which we are looking forward to grow by two or three folds by end of year.
We query this data set in a real-time setting, we can tolerate no more than 5s response time.
s
nodes is 30 times more than that of t
nodes.t
nodes since these can only be filtered by the query params provided above.heap.max_size
and heap.initial_size
are 13Gi
We removed the infamous r.affinity > 0
where clause in favour of an additional property filter t.is_identification_enabled = true
on the t
node itself, leveraging the following query:
MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
WHERE t.iscore IS NOT NULL
AND t.iscore >= $iScoreFrom
AND t.iscore <= $iScoreTo
AND t.followers >= $followersFrom
AND t.followers <= $followersTo
AND t.avg_video_views >= $videoViewsFrom
AND t.avg_video_views <= $videoViewsTo
AND t.avg_engagement >= $avgEngagementFrom
AND t.avg_engagement <= $avgEngagementTo
AND t.is_identification_enabled = true
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;
With no difference whatsoever in query performance. Got to mention that we do allow to ORDER BY r.affinity
.
Results: Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 131321 total db hits in 8330 ms.
Profile
The only hint I think we can provide to the query planner is a Join hint, since we cannot hint any index on t
nodes for the aforementioned reasons. Nonetheless I did try other hints mainly, SCAN hints. Leading to the following query:
MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
USING JOIN ON t
WHERE t.iscore IS NOT NULL
AND t.iscore >= $iScoreFrom
AND t.iscore <= $iScoreTo
AND t.followers >= $followersFrom
AND t.followers <= $followersTo
AND t.avg_video_views >= $videoViewsFrom
AND t.avg_video_views <= $videoViewsTo
AND t.avg_engagement >= $avgEngagementFrom
AND t.avg_engagement <= $avgEngagementTo
AND t.is_identification_enabled = true
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;
One thing to note is that with the JOIN hint the number DB hits skyrocketed but seems that response time has been reduced, or at leased that is the impression we got from profiling.
Results: Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 544294 total db hits in 5295 ms.
Profile
Once deployed to our testing environment we could not note any improvement.
I find it very hard to explain how the latter query, using JOIN hint is more performant despite the huge number of DB hits, ~78% of our entire node population!
One thing that have been a hurdle to workaround is cached query results, we are not able to benchmark different approaches to query tuning due to that. We could not find a way to disable caching, not sure if this CYPHER replan=force
is the way to go, but it is only supported in v4.x.
That said
Thanks!
Have a nice day.
k.
07-13-2020 05:57 AM
Can you also check your memory configuration, esp. for HEAP and Pagecache? And the pagecache-hits/misses on your query that should also show up in profile? I would probably give a bit more to the page-cache than the heap, so that at least your nodes, their properties and the relationship store and the indexes are mapped. you can use apoc.warmup.run(true,false,true) to load the page-cache.
Using JOIN should help you as it can do two index lookups, esp. if you force an index use of Talent(iscore) then it can even keep the sorting from the index (at least in Neo4j 4.x, not sure if that was already in 3.5)
07-13-2020 06:00 AM
Ah, just looked at your PROFILE it's already doing that.
So please check the page-cache settings.
What is your SKIP/LIMIT size?
Something else that I would check is if there are any properties that are large that you're returning to the client?
Does it show the same behavior when you run it outside of your cluster with the same memory settings on a single instance / localhost?
Also how selective is iscore? Is it correct that you get 58.5k results from the index? For iscoreFrom < iscore < iscoreTo ?
The other thing that's odd is that all your property filters only reduce that node-count from 58k to 56k ?
Are some of those properties dependent?
You could try to move the filter those additional properties after the sort just before the skip/limit.
07-13-2020 07:15 AM
Hello Michael,
Thanks for your reply.
These were our configs:
- name: NEO4J_dbms_connector_bolt_thread__pool__min__size
value: "50"
- name: NEO4J_dbms_connector_bolt_thread__pool__max__size
value: "1000"
- name: NEO4J_dbms_connector_bolt_thread__pool__keep__alive
value: "1m"
- name: NEO4J_dbms_memory_heap_max__size
value: "12G"
- name: NEO4J_dbms_memory_heap_initial__size
value: "4G"
- name: NEO4J_causal__clustering_log__shipping__max__lag
value: "512"
Our pagecache setting was set to 512M.
Using JOIN should help you as it can do two index lookups, esp. if you force an index use of Talent(iscore)
Will add another Index hint.
What is your SKIP/LIMIT size?
Anywhere from 20-50 records.
Something else that I would check is if there are any properties that are large that you're returning to the client?
Not really, all values are either numeric or short strings.
Does it show the same behavior when you run it outside of your cluster with the same memory settings on a single instance / localhost?
Yes!
Also how selective is iscore? Is it correct that you get 58.5k results from the index? For iscoreFrom < iscore < iscoreTo ?
The other thing that's odd is that all your property filters only reduce that node-count from 58k to 56k ?
Are some of those properties dependent?
In the queries above we only filter by iScore, hence the low diff in filtered nodes count. It is expected, since we are trying to address edge cases in our testing.
You could try to move the filter those additional properties after the sort just before the skip/limit.
Will give it a try.
We ran neo4j-admin memrec
and we got back the following recommendation, which we applied and noticed some improvements:
#
# Assuming the system is dedicated to running Neo4j and has 26100m of memory,
# we recommend a heap size of around 9000m, and a page cache of around 10600m,
# and that about 6500m is left for the operating system, and the native memory
# needed by Lucene and Netty.
#
# Tip: If the indexing storage use is high, e.g. there are many indexes or most
# data indexed, then it might advantageous to leave more memory for the
# operating system.
#
# Tip: The more concurrent transactions your workload has and the more updates
# they do, the more heap memory you will need. However, don't allocate more
# than 31g of heap, since this will disable pointer compression, also known as
# "compressed oops", in the JVM and make less effective use of the heap.
#
# Tip: Setting the initial and the max heap size to the same value means the
# JVM will never need to change the heap size. Changing the heap size otherwise
# involves a full GC, which is desirable to avoid.
#
# Based on the above, the following memory settings are recommended:
dbms.memory.heap.initial_size=9000m
dbms.memory.heap.max_size=9000m
dbms.memory.pagecache.size=10600m
Will keep you posted.
Thanks!
07-13-2020 07:53 AM
How big is your store on disk? I guess 512M page-cache is way too small.
Would you be able to share the data with me for some internal testing?
Please also consider upgrading to 4.1 for quite a number of improvements in the Cypher runtime.
07-13-2020 08:19 AM
Its ~20Gb, I presume it is too small yes, increased to 11Gi.
Would you be able to share the data with me for some internal testing?
Will send you a link to a backup on Slack.
Please also consider upgrading to 4.1 for quite a number of improvements in the Cypher runtime.
We will, had a conversation earlier with Abed about the upgrade, we have no reason not to.
Cheers.
07-14-2020 03:53 AM
Following up on some of the earlier suggestions:
Using JOIN should help you as it can do two index lookups, esp. if you force an index use of Talent(iscore)
Tried to add an additional Index hint to Talent(iscore) but since we already have a JOIN hint on the same node, we got Multiple hints for same variable are not supported
.
You could try to move the filter those additional properties after the sort just before the skip/limit.
Tried to do this too, but got syntax errors. Honestly i wasn't aware that we could use WHERE
statements after the RETURN
statement, is this what you meant or did I get it wrong?
I should also note that we have a read replica with 18 Gb of RAM but after retrieving configs it turned out to have the default heap and pagecache values of 512M.
We applied the same resource recommendation to the read replica, so that heap_initial_size
and heap.max_size
are set to 9Gi, pagecache.size
to 11Gi, but hardly noticed any improvement!
I understand that how requests are rerouted depends on how we use the driver but what would be the optimal approach to that, route all read to read queries to read replicas while avoiding followers?
07-16-2020 04:42 PM
there are routing policies that are routing only to RR but I haven't used them myself, should be in the docs.
Yes you should check the config on all cluster members.
Did you run multiple queries or just one?
I tested it on 3.5 and 4.1 with 14G page-cache and had 50-200ms (worst case) results.
// 73ms
MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
WHERE t.iscore IS NOT NULL
AND t.iscore >= $iScoreFrom
AND t.iscore <= $iScoreTo
AND t.followers >= $followersFrom
AND t.followers <= $followersTo
AND t.avg_video_views >= $videoViewsFrom
AND t.avg_video_views <= $videoViewsTo
AND t.avg_engagement >= $avgEngagementFrom
AND t.avg_engagement <= $avgEngagementTo
WITH s, r, t
WHERE r.affinity > 0 // <----
RETURN t AS talent, r AS belongsTo
ORDER BY r.affinity DESC
SKIP $offset LIMIT $limit;
// page-cache 14G
// heap 16G
:param avgEngagementFrom => 0
:param avgEngagementTo => 1000000000000
:param checksum => "ba92565be35fde27727bd375b9e67c35"
:param followersFrom => 0
:param followersTo => 1000000000000
:param iScoreFrom => 2
:param iScoreTo => 10
:param limit => 15
:param offset => 0
:param videoViewsFrom => 0
:param videoViewsTo => 1000000000000
MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
USING JOIN ON t
WHERE $iScoreFrom <= t.iscore <= $iScoreTo
WITH t,r
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit
WITH t,r
WHERE $followersFrom <= t.followers <= $followersTo
AND $videoViewsFrom <= t.avg_video_views <= $videoViewsTo
AND $avgEngagementFrom <= t.avg_engagement <= $avgEngagementTo
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;
// 67ms
MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
USING JOIN ON t
WHERE $iScoreFrom <= t.iscore <= $iScoreTo
AND $followersFrom <= t.followers <= $followersTo
AND $videoViewsFrom <= t.avg_video_views <= $videoViewsTo
AND $avgEngagementFrom <= t.avg_engagement <= $avgEngagementTo
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;
// 206ms
// 120ms in 4.1
cypher expressionEngine=compiled
MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
WHERE $iScoreFrom <= t.iscore <= $iScoreTo
AND $followersFrom <= t.followers <= $followersTo
AND $videoViewsFrom <= t.avg_video_views <= $videoViewsTo
AND $avgEngagementFrom <= t.avg_engagement <= $avgEngagementTo
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;
// 55ms in 4.1
07-27-2020 08:49 AM
Hello Michael,
We've been trying to replicate your results with no success.
The first time we run a query, its response time vary between 20s - 75s, any consecutive queries take ~ 200ms.
Node: Google Cloud 2 CPU
- 32Gi RAM
( 29Gi
allocatable)
Docker container: 26Gi RAM
Neo4j: v4.1.1
Since we shared with you the latest backup of our store it has now grown to ~715k nodes and ~185M relations.
Benchmarking with one of the above queries:
PROFILE MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
USING JOIN ON t
WHERE $iScoreFrom <= t.iscore <= $iScoreTo
WITH t,r
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit
WITH t,r
WHERE $followersFrom <= t.followers <= $followersTo
AND $videoViewsFrom <= t.avg_video_views <= $videoViewsTo
AND $avgEngagementFrom <= t.avg_engagement <= $avgEngagementTo
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;
Checksum: 52cd37fc3f4d3ab81f85f5d4cf03bfb3
Profile
All the sessions of the conference are now available online