cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Indexing mongo IDs

I have nodes with properties as follows:

  • uid: this mongo object ID string.
  • typeName: a regular string field.

I have created the following indexes:

create index on :Document(uid)
create index on :Document(typeName)
create index on :Document(uid, typeName)

Now the issue I am facing is when I try to run the following query, its very slow with all the indexes in place.

profile match (doc1:Document)-[*]->(doc2:Document)
where doc1.uid = "58ad88227faf733d28c8c4ee" and doc2.typeName = "Disease"
return doc2

Once I drop the following indexes and retain the typeName index the same query runs blazing fast.

drop index on :Document(uid)
drop index on :Document(uid, typeName)

Seems like when it tries to use the uid index its slowing things down. Why would that be.

1 ACCEPTED SOLUTION

No, this has nothing to do with indexing itself, the lookups of just the nodes are fast in either case.

The problem is the number of possible path matches expanding the pattern from one node vs the other. There are 54k possible paths to evaluate when expanding from doc1, which filters down to only 9, but it's doing a ton of work that it doesn't need to do. You DON'T want to find and filter every possible path between the nodes, you just want to know if they're connected, and shortestPath() uses a bidirectional bfs expansion to do this, and once a single path is found it stops looking, where in your original query it's exhaustive and will find all possible paths between the pair of nodes, not stopping after the first is found.

View solution in original post

7 REPLIES 7

What version of Neo4j?
Do you have the results of the PROFILE / EXPLAIN to show what it is doing?
I would expect your initial experience when having both indexes should result in to NodeIndexSeeks? if it does not you may want to add a USING INDEX hint and thus

profile match (doc1:Document)-[*]->(doc2:Document)
using index doc1:Document(uid) 
where doc1.uid = "58ad88227faf733d28c8c4ee" and doc2.typeName = "Disease"
return doc2

We'll need PROFILE plans of the queries to see what's going on.

My guess though is that when starting from doc2, index lookup is fast, and following the path back from that document to doc1 and filtering is an easy operation, likely there aren't many paths to evaluate.

We might see that if we start from doc1 that there could be a great many paths to consider when finding doc2, so it could be the graph structure, and the number of possible paths out, that may be slowing this down, and not something that's directly index related. The plans may confirm this.

I should also point out that this approach for the query likely won't meet your needs, as the query is looking for all possible paths between these nodes (not just one...it won't stop until all possible paths are explored and evaluated).

Since you're just trying to see if the two nodes are connected, try using shortestPath() instead (it will stop when the first path is found), it will do a bidirectional bfs expansion and may be faster, depending on the graph structure between them:

profile 
match shortestPath((doc1:Document)-[*]->(doc2:Document))
where doc1.uid = "58ad88227faf733d28c8c4ee" and doc2.typeName = "Disease"
return doc2

Here are the profile sceenshots.

with only typename index


with uid and typename index both

Yep, that confirms my assumptions. Note that the expand when we start from doc1 generates 54k rows of results, and it takes a bit of time for that expand and filtering down, and when we expand from doc2 instead there are only 11 rows.

Now try profiling with the query I provided using shortestPath().

With both uid and typename index in place


with only typename index in place

could it be that as typename index doesn't have as many index as uid index, with only typename index its faster. or could it be the mongo IDs in uid index is hitting some kinda size limitation.

No, this has nothing to do with indexing itself, the lookups of just the nodes are fast in either case.

The problem is the number of possible path matches expanding the pattern from one node vs the other. There are 54k possible paths to evaluate when expanding from doc1, which filters down to only 9, but it's doing a ton of work that it doesn't need to do. You DON'T want to find and filter every possible path between the nodes, you just want to know if they're connected, and shortestPath() uses a bidirectional bfs expansion to do this, and once a single path is found it stops looking, where in your original query it's exhaustive and will find all possible paths between the pair of nodes, not stopping after the first is found.

Understood thanks for the suggestion