cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Optional match giving wrong result

ri8ika
Graph Voyager

This gives me wrong result, an unexpected behavior:

MATCH (n) WHERE id(n) = 123
OPTIONAL MATCH (a:A) -- (n)
WITH n, a, collect(a.actor) as actor
OPTIONAL MATCH (a) -- (n)

When I repeat the WHERE clause, it gives me the correct result:

MATCH (n) WHERE id(n) = 123
OPTIONAL MATCH (a:A) -- (n)
WITH n, a, collect(a.actor) as actor
OPTIONAL MATCH (a) -- (n) WHERE id(n) = 123

I know the behavior with WHERE clause is different when using OPTIONAL MATCH, but how can we avoid the repeat WHERE clause? I have multiple OPTIONAL MATCH queries so, this is not good to go with. What I am missing?

1 ACCEPTED SOLUTION

By latest version I'm assuming 4.2.5?

In any case, this is not a bug, but a misunderstanding of how OPTIONAL MATCH behaves.

Once variables are bound from earlier in the query (n1 and n2) an OPTIONAL MATCH will never be able to change those variables, that would be a filtering operation, and OPTIONAL MATCH never filters out results. That's why I said earlier:

OPTIONAL MATCH only has meaning when it is introducing new variables

There are several ways to get the results you want.

One is to do the OPTIONAL MATCH with a new variable in place of n2, but use a WHERE clause to restrict possible values:

MATCH (n1:Node1) WHERE id(n1) = 123
OPTIONAL MATCH (n2:Node2) - [:FOO] -> (n1)
WITH n1, n2, collect(id(n2)) as foo
OPTIONAL MATCH (x) - [:BAR] -> (n1)
WHERE x = n2
RETURN {foo: foo, bar: collect(id(x))}

You could also use subqueries similarly.

One other thing to mention, your aggregation is flawed:

WITH n1, n2, collect(id(n2)) as foo

Because n2 is in the grouping key, this will not produce the results you want. You will get a row per distinct n1, n2. foo will always be a list of the id for that n2. It will be at least a single element. If there are multiple :FOO relationships between the two specific n1 and n2 nodes on that row, then there may be multiple elements in foo, a repeating of n2's id() for however many :FOO relationships are between those two specific nodes.

For example, assume this graph exists, where there are multiple :FOO relationships between n1 and n2:

Node2{id:456} - [:FOO] -> Node1{id:123}
Node2{id:456} - [:FOO] -> Node1{id:123}
Node2{id:456} - [:FOO] -> Node1{id:123}
Node2{id:456} - [:BAR] -> Node1{id:789}

Then the output of the first half of the query, at this line WITH n1, n2, collect(id(n2)) as foo, is:

{
 n1: Node1{id:123}, n2: Node1{id:456}, foo:[456, 456, 456]
}

If there were other nodes connected by :FOO relationships, you would get similar rows per distinct n1 and n2 pairs, with a similar list of at least one element, though there could be more depending on the number of duplicate :FOO relationships between the pair of nodes for that row.

That's not likely what you want. What is it you want from that aggregation? What is its intended behavior?

View solution in original post

6 REPLIES 6

Well first, your last OPTIONAL MATCH doesn't serve any purpose. n and a are already bound variables, the OPTIONAL MATCH isn't introducing any additional variables, so it actually isn't doing (or shouldn't be doing) anything meaningful. OPTIONAL MATCH only has meaning when it is introducing new variables, so if there's a difference in what is returned, then there's a bug here somewhere.

It would help if you provided an example of correct vs incorrect results, and also let us know the version of Neo4j you are using.

I have given an example. Please check.

ri8ika
Graph Voyager

@andrew.bowman
Okay, let me give you an example.

I only have the following in the database:

Node2{id:456} - [:FOO] -> Node1{id:123}
Node2{id:456} - [:BAR] -> Node1{id:789}

Now, when I use the following statement:

MATCH (n1:Node1) WHERE id(n1) = 123
OPTIONAL MATCH (n2:Node2) - [:FOO] -> (n1)
WITH n1, n2, collect(id(n2)) as foo
OPTIONAL MATCH (n2) - [:BAR] -> (n1)
RETURN {foo: foo, bar: collect(id(n2))}

This is giving me wrong result. I think the WHERE clause is not satisfying with all optional matches?

Output:

{
  "foo": [456],
  "bar": [456]
}

Expected:

{
  "foo": [456],
  "bar": []
}

Because, Node1 with id 123 has no BAR relation.

I am using the latest version of neo4j.

By latest version I'm assuming 4.2.5?

In any case, this is not a bug, but a misunderstanding of how OPTIONAL MATCH behaves.

Once variables are bound from earlier in the query (n1 and n2) an OPTIONAL MATCH will never be able to change those variables, that would be a filtering operation, and OPTIONAL MATCH never filters out results. That's why I said earlier:

OPTIONAL MATCH only has meaning when it is introducing new variables

There are several ways to get the results you want.

One is to do the OPTIONAL MATCH with a new variable in place of n2, but use a WHERE clause to restrict possible values:

MATCH (n1:Node1) WHERE id(n1) = 123
OPTIONAL MATCH (n2:Node2) - [:FOO] -> (n1)
WITH n1, n2, collect(id(n2)) as foo
OPTIONAL MATCH (x) - [:BAR] -> (n1)
WHERE x = n2
RETURN {foo: foo, bar: collect(id(x))}

You could also use subqueries similarly.

One other thing to mention, your aggregation is flawed:

WITH n1, n2, collect(id(n2)) as foo

Because n2 is in the grouping key, this will not produce the results you want. You will get a row per distinct n1, n2. foo will always be a list of the id for that n2. It will be at least a single element. If there are multiple :FOO relationships between the two specific n1 and n2 nodes on that row, then there may be multiple elements in foo, a repeating of n2's id() for however many :FOO relationships are between those two specific nodes.

For example, assume this graph exists, where there are multiple :FOO relationships between n1 and n2:

Node2{id:456} - [:FOO] -> Node1{id:123}
Node2{id:456} - [:FOO] -> Node1{id:123}
Node2{id:456} - [:FOO] -> Node1{id:123}
Node2{id:456} - [:BAR] -> Node1{id:789}

Then the output of the first half of the query, at this line WITH n1, n2, collect(id(n2)) as foo, is:

{
 n1: Node1{id:123}, n2: Node1{id:456}, foo:[456, 456, 456]
}

If there were other nodes connected by :FOO relationships, you would get similar rows per distinct n1 and n2 pairs, with a similar list of at least one element, though there could be more depending on the number of duplicate :FOO relationships between the pair of nodes for that row.

That's not likely what you want. What is it you want from that aggregation? What is its intended behavior?

OPTIONAL MATCH only has meaning when it is introducing new variables

Now, I got your point. Thanks.

Hello @ri8ika

Can you provide us what your graph database must looks like with a tool like arrows.app.
And can you, in short explain in english not Cypher want you want from it?

If your Cypher is wrong it's a lot harder for us to understand what you want from it.