cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Querying on multiple labels and property values appear not to use indexes

Hi

I've been experimenting with Neo4J 3.5.0 and looking at one particular query. My initial work was based on a Gremlin query which was performing badly, and i've now tried several different constructions in Gremlin as well as now trying various forms in Cypher.

The main thing thing is being able to query for different node labels both of which share a common property and additionally checking for values in a list. Given the size of graph, 500M nodes i'd like to have it use indexes (all of which are created) but so far it seems to resort to a FILTER (for the property match) over the matched nodes (from matching the label).

I've tried the simple form:

MATCH (a) WHERE (a:label1 or a:label2 or a:label3) AND a.propName in ['val1','val2','val3'] RETURN count(a)

Then i've tried several different constructions using more elaborate OR and UNION. The closest i've come to is:

MATCH (a:label1) WHERE a.propName in ['val1','val2','val3'] RETURN a
UNION
MATCH (a:label2) WHERE a.propName in ['val1','val2','val3'] RETURN a
UNION
MATCH (a:label3) WHERE a.propName in ['val1','val2','val3'] RETURN a

However this yields a list of nodes rather than the simpler count i was aiming for.

I was looking for any suggestions for getting the simpler type of query working effeciently, or a way to adapt the more verbose query to get a (distinct) count.

Thanks for any help.

Tony

1 ACCEPTED SOLUTION

Provided that you don't have any nodes multi-labeled with these labels, you could just do 3 OPTIONAL MATCHes and add up the counts for each:

OPTIONAL MATCH (n:label1) 
WHERE n.propName in ['val1','val2','val3'] 
WITH count(n) as count
OPTIONAL MATCH (a:label2) 
WHERE n.propName in ['val1','val2','val3'] 
WITH count + count(n) as count
OPTIONAL MATCH (n:label3) 
WHERE n.propName in ['val1','val2','val3'] 
RETURN count + count(n) as count

If the nodes can be multi-labeled, then the problem becomes more difficult, as we would need to collect the results and de-duplicate and finally get the final count.

That said...the fact that the same propName is used for all 3 labels, and the values can exist across them, suggests a relationship among these kinds of nodes. Would it make sense for your model to add an additional label for nodes of these types, that way you could more generically match using the super label (after creating an index for it)? Alternately, would it make sense to create any graph structures between nodes like this, and match to some common node which is related and use that as a starting point?

View solution in original post

5 REPLIES 5

Provided that you don't have any nodes multi-labeled with these labels, you could just do 3 OPTIONAL MATCHes and add up the counts for each:

OPTIONAL MATCH (n:label1) 
WHERE n.propName in ['val1','val2','val3'] 
WITH count(n) as count
OPTIONAL MATCH (a:label2) 
WHERE n.propName in ['val1','val2','val3'] 
WITH count + count(n) as count
OPTIONAL MATCH (n:label3) 
WHERE n.propName in ['val1','val2','val3'] 
RETURN count + count(n) as count

If the nodes can be multi-labeled, then the problem becomes more difficult, as we would need to collect the results and de-duplicate and finally get the final count.

That said...the fact that the same propName is used for all 3 labels, and the values can exist across them, suggests a relationship among these kinds of nodes. Would it make sense for your model to add an additional label for nodes of these types, that way you could more generically match using the super label (after creating an index for it)? Alternately, would it make sense to create any graph structures between nodes like this, and match to some common node which is related and use that as a starting point?

Query cannot conclude with WITH (must be a RETURN clause, an update clause, a unit subquery call, or a procedure call with no YIELD) (line 3, column 1 (offset: 54))
"WITH COUNT(n) As COUNT" PLEASE EXPLAIN THIS ERROR

it looks like your query terminates with a ‘with’ clause.  This is not allowed. You need to end with a ‘return’ clause. You can optionally add any of the following ‘order by’, ‘skip’, and ‘limit’ after a ‘return’ clause.  Replace ‘with’ with ‘return’ to resolve the error. 

HI

Thanks for the informatio, I had tried something very similar but without the OPTIONAL and got zero, i've tried with the OPTIONAL and that gets me the right count. The graph model in question is something we will be likely looking at but for now i was just getting a first pass over some queries and this seemed like one that should be easy enough to get working optimally.

For the Gremlin equivalent it doesnt look like the use of WITHIN to match the multiple property values may be using the index and using UNIONs or ORs it seemed to fall back to the same kind of query plan but i'll have another play with that one tomorrow, at least for Cypher there is a workable solution.

Thanks again

Tony

It should work too if you just do multiple matches with different variable names.
But then you span up a cartesian product and need to do count(distinct n) which is probably not in your interest.

So Andrew's solution should work best.