Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-14-2018 07:23 AM
We've been slowly building up a Neo4j database (v.3.4.4 community edition) of genetic analyses. Up until the past week, we've been working with relatively small datasets and fleshing out the relationships, etc. Everything had been going quite well, so we decided to start loading considerably more data to see if the performance stayed consistent.
Now, some of the compound indexes in place appear to have stopped being used, which is quite disconcerting. I'm hoping with some information either the err of our ways can be pointed out or I can learn whether or not something else is wrong.
One of the node labels is :Variant
, which would look like so:
{
name: "X:843593:C:TG",
chromosome: "X",
position: 843593,
reference: "C",
alt: "TG"
}
The :Variant
label also has the following, compound index defined (and I can confirm its existence with call db.indexes()
😞
CREATE INDEX ON :Variant(chromosome, position)
Again, up until recently, this seemed to be working fine with about 1M variant nodes in the database. After loading ~8M more variant nodes (and ~100M other analyses nodes), even a simple query like counting variants in a range takes way too long:
MATCH (n:Variant)
WHERE n.chromosome="8"
AND n.position>2000000
AND n.position<3000000
RETURN count(n)
The above query can take > 6 seconds to run, and only counted ~6000 nodes. When I go to explain/profile the query, I get the following output (note how chromosome is nowhere in the filter, but just a simple test on the position only):
It's possible that it's never used the compound index and we previously had so few nodes that the performance cost of counting everything was negligible. Using an index hint does not help either, although I wouldn't expect it to for a simple query like the above (it either knows of the index and uses it where appropriately or it doesn't think it can).
So... what are we doing wrong?
:Variant
nodes existed for a period of time?As a final point of reference, a MySQL table of variants with the same compound index, 10x the number of records, and on similar hardware returns the count in < 100 ms. This isn't to knock Neo4j, but rather it's my personal gut-check to know if what I'm seeing is within the realm of reasonable, and right now it isn't. So, I'm sure that something is either setup wrong or not correctly being used.
Thanks for any insights!
11-14-2018 12:22 PM
Hi Massung,
Composite indexes only support exact lookups (either using =
or list membership with IN
predicates), and won't work with range comparisons. This is noted in the range comparison section of the index documentation:
Composite indexes are currently not able to support range comparisons.
I've requested to our documentation team that we also mention this in the composite index description section, as its current location doesn't make this easily discoverable when you're just reading up on the sections on composite index characteristics.
To support your usage, you'll want (in addition) a single index on :Variant(position) and/or :Variant(chromosome), which will allow an index lookup on one field or the other, and then filter on the remaining field, depending on what is recommended by the planner.
11-14-2018 12:36 PM
Thanks for the follow-up.
I ended up finding that tidbit in the documentation about an hour after posting. But, for a test, I created a compound index on :Variant(chromosome, reference)
and did an =
test (on both fields) and it didn't use it in the planner.
Similarly, I would have expected the first key in the compound index (chromosome) to be used even if the range part (position) wasn't possible. Is a compound index only used if all fields in the query can be used for it? As opposed to SQL where as much of the index will be used as possible?
Jeff
11-14-2018 03:31 PM
Hi Massung,
You're correct that the composite index is only used when all fields that make up the composite index are used for lookup. It's not a replacement for single-field indexes, which have more capability and have a different backing structure.
As for your test, can you supply instructions to create a simple test graph where that behavior is reproducible? We'd want to take a look at that to see if there's a bug in play.
All the sessions of the conference are now available online