cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Why do these two queries differ a lot in speed?

lingvisa
Graph Fellow
MATCH (m)
WHERE  m.name = "XPhone"
RETURN properties(m) as properties LIMIT 1

And

MATCH (m:Product)
WHERE  m.name = "XPhone"
RETURN properties(m) as properties LIMIT 1

In Neo4j Desktop, I observed that the first query takes 1605ms and the 2nd takes 32ms. My graph has a total of 350,000 nodes, and 90000 Products. It's a quite small graph, and I have an index on the 'name' property of all nodes. In both cases, the queries return nothing, since XPhone is not in the graph. It seems that the index didn't effect. Otherwise, it won't be so slow for the first query. Right?

In the first case, even if there is an index on 'name', the query still has to go through all the nodes in the graph? Then how is it different from no index at all?

8 REPLIES 8

Hi @lingvisa

MATCH (m:Product)
WHERE  m.name = "XPhone"
RETURN properties(m) as properties LIMIT 1

The Cypher command above is similar to this.
It's narrowed down by the label.

MATCH (m)
  WHERE  m.name = "XPhone"
  AND head(labels(m)) = "Product"
RETURN properties(m) as properties LIMIT 1

lingvisa
Graph Fellow

So index only works together with labels. When no label is specified, the query is not using index at all. Right? So the only way to speed up queries when labels are unknown is to use full text query? In practice, a lot of times labels are not known before querying the graph. Any recommendations to make queries faster in such cases?

correct indexes are defined based upon a Label and 1 or more node properties.
Without a label I dont see how the performance can be improved. For example if my model was such that it had node labels describing contents of a library, and thus there were nodes with label

:Book
:Music
:ArtWork
:Magazines
:Newspapers

and in total this represented 100 million nodes.

and you are asking, match (n) where n.author='JSmith' return n; then effectively you are asking find me something/anything in my library which has an author of 'JSmith', and as such we must do an AllNodesScan (i.e. 100 million nodes). Further our results might return a :Book and a :Artwork node.

Now if we simply changed this to match (n:Artwork) where n.author='JSmith' return n; then we would do a ScanByLabel and presumably scan much less than the 100 million nodes. And if there was a index on author then a NodeIndex would be used further reducing the number of data to be searched.

This behaviour is shutting down most part of the Neo4j power, I might be wrong and I'm inviting any Neo4j staff with more experience to say it if I am, but you're fucked.

There is only two pure cypher way in Neo4j to use indexes:
Specifying a label without a property filter ( labels alone are indexes )
Specifying a label with a property filter ( Need to create an index for it )

Or you are using the full text from apoc but still you will need to anchor on something.

I think you must rethink the problem, or write a program who create the request for you according to some parameters. But the request must have at least labels / type for your anchors if you are expecting any performance benefits at large scale.

The usual approach for having global indexes is to add a generic label to all those nodes like :Node or :Entity And use that in the index and your queries.

It sounds right. So a global index on Node:name should be faster than no global index when querying on a node's name property. Right? In my case, I mainly query on some nodes' name property.

I believe using label decreases amount of scanned nodes and improves search speed by property even if it's indexed or not.

Right but it still needs to scan all nodes with the label, while the index selectively picks the few nodes with the property value for that label.