Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-22-2018 07:42 PM
Some use cases require matching to nodes which aren't connected to any of some other set of nodes.
We'll discuss both incorrect and correct approaches to this kind of query.
For our examples we'll use a recipe graph, where the primary structure is as follows:
(:Recipe)-[:INCLUDES]->(:Ingredient)
The use case is, provided a list of ingredient names to exclude, to match to recipes which do not contain any of those ingredients.
A common incorrect approach does not use collections, but assumes (wrongly) that the pattern negated treats the variable values as a collection.
MATCH (excluded:Ingredient)
WHERE excluded.name in $excludedIngredients
MATCH (r:Recipe)
WHERE NOT (r)-[:INCLUDES]->(excluded)
RETURN r
For a single excluded ingredient, the above query will work just fine.
However, when $excludedIngredients
contains several entries, this query may fail to produce correct results.
The reason is because each excluded ingredient is on its own record/row, and will be tested individually, which will mean some recipes that have one of the excluded ingredients, but not all, will not be filtered correctly.
For example, let's say the parameters are as follows: {excludedIngredients:['eggs', 'walnuts']}
Let's say a Chocolate Cake
:Recipe is being evaluated, which contains eggs, but not walnuts.
Because each excluded ingredient is on its own record, if we inspected the built-up records at the time of evaluation, it may look like this:
|===
| excluded.name | r.name
| eggs | Chocolate Cake
| walnuts | Chocolate Cake
|===
The build up records are for each single ingredient with each recipe that contains it.
These are not arranged into collections automatically (that would require use of collect()
)
Each record will be independently evaluated: WHERE NOT (r)-[:INCLUDES]->(excluded)
The first record will be evaluated, and as the Chocolate Cake
contains eggs, the record will be eliminated.
The second record will be evaluated, and as the Chocolate Cake
does not contain walnuts, the record is kept.
The Chocolate Cake :Recipe will be returned as a result, which is clearly incorrect.
The correct approach is to use collection membership to drive the exclusion. There are actually many similar correct queries that start from this approach, some with varying efficiency, depending on the actual graph:
MATCH (excluded:Ingredient)
WHERE excluded.name in $excludedIngredients
WITH collect(excluded) as excluded
MATCH (r:Recipe)-[:INCLUDES]->(i)
WITH excluded, r, collect(i) as ingredients
WHERE NONE (i in ingredients where i in excluded)
RETURN r
We can do the same thing using pattern comprehension to expand and collect all at once.
MATCH (excluded:Ingredient)
WHERE excluded.name in $excludedIngredients
WITH collect(excluded) as excluded
MATCH (r:Recipe)
WITH excluded, r, [(r)-[:INCLUDES]->(i) | i] as ingredients
WHERE NONE (i in ingredients where i in excluded)
RETURN r
Or we can avoid collecting ingredients for each recipe and instead use NONE() to exclude the pattern of the relationship having any of the excluded ingredients.
MATCH (excluded:Ingredient)
WHERE excluded.name in $excludedIngredients
WITH collect(excluded) as excluded
MATCH (r:Recipe)
WHERE NONE(i in excluded WHERE (r)-[:INCLUDES]->(i))
RETURN r
An alternate approach doesn't use WHERE NONE(), but instead uses an OPTIONAL MATCH from a :Recipe, but only to the nodes we want to exclude.
If there isn't a match to any of the excluded nodes, the newly introduced variable i
will be null, and we can filter with another WHERE clause to get only those rows.
MATCH (excluded:Ingredient)
WHERE excluded.name in $excludedIngredients
WITH collect(excluded) as excluded
MATCH (r:Recipe)
OPTIONAL MATCH (r)-[:INCLUDES]->(i)
WHERE i in excluded
WITH r
WHERE i IS NULL
RETURN r
All the sessions of the conference are now available online