Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-16-2021 04:08 PM
I am noticing some exorbitant and possibly inaccurate memory usage on the Sort operator when profiling queries that use an ORDER BY clause. I will use a contrived example but it is apparent anytime I do a PROFILE on a query with an ORDER BY.
Here is an example of a simple query generating 1000 random numbers without sorting them:
profile
with [i in range(0,1000) | rand()] as rands
unwind rands as r
return r
And the associated PROFILE execution plan:
Note the memory usage of just 72 bytes on the ProduceResults operator.
Now here is the same query just sorted:
profile
with [i in range(0,1000) | rand()] as rands
unwind rands as r
return r
order by r
And the associated PROFILE execution plan:
+----------------------------------------------------------------------------------------------------------+
| Plan | Statement | Version | Planner | Runtime | Time | DbHits | Rows | Memory (Bytes) |
+----------------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | "CYPHER 4.2" | "COST" | "PIPELINED" | 14 | 0 | 100001 | 240011257016 |
+----------------------------------------------------------------------------------------------------------+
+-----------------------+--------------------------------------------------------+----------------+--------+---------+-----------+-----------+----------------+------------+--------------------------+
| Operator | Details | Estimated Rows | Rows | DB Hits | Cache H/M | Time (ms) | Memory (Bytes) | Ordered by | Other |
+-----------------------+--------------------------------------------------------+----------------+--------+---------+-----------+-----------+----------------+------------+--------------------------+
| +ProduceResults@neo4j | r | 10 | 100001 | 0 | 0/0 | 35.126 | | r ASC | 35125600; In Pipeline 1 |
| | +--------------------------------------------------------+----------------+--------+---------+-----------+-----------+----------------+------------+--------------------------+
| +Sort@neo4j | r ASC | 10 | 100001 | 0 | 0/0 | 178.893 | 240011257016 | r ASC | In Pipeline 1; 178893300 |
| | +--------------------------------------------------------+----------------+--------+---------+-----------+-----------+----------------+------------+--------------------------+
| +Unwind@neo4j | rands AS r | 10 | 100001 | 0 | | | | | Fused in Pipeline 0 |
| | +--------------------------------------------------------+----------------+--------+---------+-----------+-----------+----------------+------------+--------------------------+
| +Projection@neo4j | [i IN range($autoint_0, $autoint_1) | rand()] AS rands | 1 | 1 | 0 | | | | | Fused in Pipeline 0 |
+-----------------------+--------------------------------------------------------+----------------+--------+---------+-----------+-----------+----------------+------------+--------------------------+
100001 rows available after 13 ms, consumed after another 1 ms
That's 240 GB! However my memory usage did not go up when watching Windows Task Manager while it was executing. Notably, 10k yielded 2.4GB. I also observe the same behavior when sorting in a WITH clause before the return. Here is some install data that might be helpful:
2021-02-16 23:45:10.690+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=190, gcTime=238, gcCount=1}
2021-02-16 23:45:11.155+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=157, gcTime=238, gcCount=1}
2021-02-16 23:45:11.682+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=107, gcTime=193, gcCount=1}
2021-02-16 23:45:12.116+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=131, gcTime=201, gcCount=1}
2021-02-16 23:45:12.569+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=251, gcTime=196, gcCount=1}
2021-02-16 23:45:13.694+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=158, gcTime=156, gcCount=1}
2021-02-16 23:45:14.026+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=131, gcTime=166, gcCount=1}
2021-02-16 23:45:14.577+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=115, gcTime=191, gcCount=1}
2021-02-16 23:45:15.056+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=177, gcTime=220, gcCount=1}
2021-02-16 23:45:15.385+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=128, gcTime=156, gcCount=1}
2021-02-16 23:45:15.732+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=140, gcTime=156, gcCount=1}
2021-02-16 23:45:17.319+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=1247, gcTime=0, gcCount=0}
2021-02-16 23:45:25.815+0000 WARN [o.n.k.i.c.VmPauseMonitorComponent] Detected VM stop-the-world pause: {pauseTime=182, gcTime=209, gcCount=1}
If you could give some insight into if this memory reporting is intended behavior and/or a more accurate way of measuring query memory usage when sorting it would be greatly appreciated! Thank you!
02-17-2021 09:37 AM
Maybe try using apoc.coll.sort()
function on the collection before unwinding it?
with apoc.coll.sort([i in range(0,1000) | rand()]) as rands
unwind rands as r
return r
see:
I'm not sure why the builtin ORDER BY
would be (potentially worse) than the apoc.coll.sort
but my guess is that sorting a list using Java would be less resource intensive than ordering DB objects.
Also... if you want to sort by Nodes based on a Node's property:
and after you get back the sorted Neo4J List, you can do UNWIND
to get back nodes.
Note: missing from the documentation for apoc.coll.sortNodes
, is if you want ascending sort by a property name, you need to prefix the property name with ^
, like '^name'
All the sessions of the conference are now available online