Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-02-2018 02:26 PM
I am trying to run a spatial bounding box search with the use of a spatial index.
In my graph, there are approximately 1.4 million nodes with a label of :ItemUsageInstance
(there are other nodes beyond those as well). These :ItemUsageInstance
nodes have a uniqueness constraint called instance_id
. There are also 3 point
properties on these node labels that are indexes: volume_min_coord
, volume_centroid
, volume_max_coord
.
What I am trying to do is, given a particular :ItemUsageInstance
node, find the closest other :ItemUsageInstance
s whose volume_centroid
is within a certain bounding box of the first ItemUsageInstance. While my query "works", it takes ~12 seconds and has 4359092 db hits. It does not seems like I am fully realizing the benefit of the spatial indexing.
Query so far:
MATCH (i:ItemUsageInstance {instance_id: '000000000123456'})
WITH i,
i.volume_min_coord.x + 50 AS upper_x, i.volume_max_coord.x - 50 AS lower_x,
i.volume_min_coord.y + 50 AS upper_y, i.volume_max_coord.y - 50 AS lower_y,
i.volume_min_coord.z + 50 AS upper_z, i.volume_max_coord.z - 50 AS lower_z
MATCH (other:ItemUsageInstance)
WHERE id(i) <> id(other)
AND point({x: lower_x, y: lower_y, z: lower_z}) < other.volume_centroid < point({x: upper_x, y: upper_y, z: upper_z})
RETURN distance(i.volume_centroid, other.volume_centroid) AS dist, other.instance_id
ORDER BY dist LIMIT 25;
If I PROFILE
the query, here is what I get:
That said, creating the index on the volume_centroid
property did seem to provide some modest improvement. I had initially forgotten to do so, and without the volume_centroid
index, there were 5805244 total db hits in 12188 ms
. After the volume_centroid
index was ONLINE
, the db hits reduced to: 4359092 total db hits in 12546 ms
Maybe my expectations are not realistic, but it seems like it ought to be doing much better. What can I do to improve the performance?
Result from :schema
:
Indexes
ON :ItemUsageInstance(volume_centroid) ONLINE
ON :ItemUsageInstance(volume_max_coord) ONLINE
ON :ItemUsageInstance(volume_min_coord) ONLINE
ON :ItemUsageInstance(instance_id) ONLINE (for uniqueness constraint)
ON :Item(item_id) ONLINE (for uniqueness constraint)
ON :Material(material_id) ONLINE (for uniqueness constraint)
Constraints
ON ( item:Item ) ASSERT item.item_id IS UNIQUE
ON ( itemusageinstance:ItemUsageInstance ) ASSERT ItemUsageInstance.instance_id IS UNIQUE
ON ( material:Material ) ASSERT material.material_id IS UNIQUE
If I drop the LIMIT 25
from the query, there are: 2994 rows available after 12209 ms, consumed after another 8 ms
Version: 3.4.5
Edition: community
10-04-2018 06:07 AM
Thanks for reporting this. We also recently noticed this issue and made a fix for it, but unfortunately the fix was broader in scope than just the spatial index and as a result will only come out in Neo4j 3.5. This is currently in beta, so you could try and see if it works for your case.
The issue was that the query planner would not plan an index seek if the predicate expression included deeper dependencies, and in the case of the spatial bounding box query this meant it would only plan the index if the max and min expressions were literal point()
functions with no dependencies. In your query you are depending on previously defined lower_x
etc. This is a reasonable thing to do, of course, and so we have enhanced the planner to deal with this case.
The kinds of queries for which we can now plan the index seek include ones like this one in our acceptance tests: https://github.com/neo4j/neo4j/blob/3.5/enterprise/cypher/acceptance-spec-suite/src/test/scala/org/n...
One curious aspect of this issue is that the index planning for the distance function actually worked a lot better already in Neo4j 3.4. This means that you can still get Cypher to plan the index in Neo4j 3.4 if you model a circle around your bounding box and use the distance
function in the predicate instead.
So your options for improving performance are:
distance
function instead to get Neo4j 3.4 to use the spatial indexFor this second option you could calculate a distance threshold as the maximum of the distances between the centroid and the max and min points, and then use that in a predicate like:
distance(i.volume_centroid, other_volume_centroid) < dist_threshold
Note: The fix was only merged recently, and did not make the Neo4j 3.5 Beta01 release, but should be in the Neo4j 3.5 Beta02 release.
All the sessions of the conference are now available online