Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-24-2019 03:13 AM
I have nodes with properties as follows:
I have created the following indexes:
create index on :Document(uid)
create index on :Document(typeName)
create index on :Document(uid, typeName)
Now the issue I am facing is when I try to run the following query, its very slow with all the indexes in place.
profile match (doc1:Document)-[*]->(doc2:Document)
where doc1.uid = "58ad88227faf733d28c8c4ee" and doc2.typeName = "Disease"
return doc2
Once I drop the following indexes and retain the typeName
index the same query runs blazing fast.
drop index on :Document(uid)
drop index on :Document(uid, typeName)
Seems like when it tries to use the uid index its slowing things down. Why would that be.
Solved! Go to Solution.
07-26-2019 11:08 AM
No, this has nothing to do with indexing itself, the lookups of just the nodes are fast in either case.
The problem is the number of possible path matches expanding the pattern from one node vs the other. There are 54k possible paths to evaluate when expanding from doc1
, which filters down to only 9, but it's doing a ton of work that it doesn't need to do. You DON'T want to find and filter every possible path between the nodes, you just want to know if they're connected, and shortestPath()
uses a bidirectional bfs expansion to do this, and once a single path is found it stops looking, where in your original query it's exhaustive and will find all possible paths between the pair of nodes, not stopping after the first is found.
07-24-2019 09:44 AM
What version of Neo4j?
Do you have the results of the PROFILE / EXPLAIN to show what it is doing?
I would expect your initial experience when having both indexes should result in to NodeIndexSeeks? if it does not you may want to add a USING INDEX hint and thus
profile match (doc1:Document)-[*]->(doc2:Document)
using index doc1:Document(uid)
where doc1.uid = "58ad88227faf733d28c8c4ee" and doc2.typeName = "Disease"
return doc2
07-24-2019 11:30 AM
We'll need PROFILE plans of the queries to see what's going on.
My guess though is that when starting from doc2
, index lookup is fast, and following the path back from that document to doc1
and filtering is an easy operation, likely there aren't many paths to evaluate.
We might see that if we start from doc1
that there could be a great many paths to consider when finding doc2
, so it could be the graph structure, and the number of possible paths out, that may be slowing this down, and not something that's directly index related. The plans may confirm this.
I should also point out that this approach for the query likely won't meet your needs, as the query is looking for all possible paths between these nodes (not just one...it won't stop until all possible paths are explored and evaluated).
Since you're just trying to see if the two nodes are connected, try using shortestPath()
instead (it will stop when the first path is found), it will do a bidirectional bfs expansion and may be faster, depending on the graph structure between them:
profile
match shortestPath((doc1:Document)-[*]->(doc2:Document))
where doc1.uid = "58ad88227faf733d28c8c4ee" and doc2.typeName = "Disease"
return doc2
07-25-2019 02:15 AM
Here are the profile sceenshots.
with only typename index
07-25-2019 12:44 PM
Yep, that confirms my assumptions. Note that the expand when we start from doc1
generates 54k rows of results, and it takes a bit of time for that expand and filtering down, and when we expand from doc2
instead there are only 11 rows.
Now try profiling with the query I provided using shortestPath().
07-26-2019 03:46 AM
With both uid and typename index in place
could it be that as typename index doesn't have as many index as uid index, with only typename index its faster. or could it be the mongo IDs in uid index is hitting some kinda size limitation.
07-26-2019 11:08 AM
No, this has nothing to do with indexing itself, the lookups of just the nodes are fast in either case.
The problem is the number of possible path matches expanding the pattern from one node vs the other. There are 54k possible paths to evaluate when expanding from doc1
, which filters down to only 9, but it's doing a ton of work that it doesn't need to do. You DON'T want to find and filter every possible path between the nodes, you just want to know if they're connected, and shortestPath()
uses a bidirectional bfs expansion to do this, and once a single path is found it stops looking, where in your original query it's exhaustive and will find all possible paths between the pair of nodes, not stopping after the first is found.
07-29-2019 01:45 AM
Understood thanks for the suggestion
All the sessions of the conference are now available online