Neo4j

nima · ‎08-09-2019

Hi

consider the following query which returns 2 different paths (Starting with Catalog:catalog-1 to SalesCategory:scat-o ) as expected

MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
RETURN DISTINCT p

now, we want to add some extra logic to filter the returned paths: nodes of the returned paths should not have any relation with some other nodes (we call it HideInStore nodes), here's the query:

MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
WITH c,s,p, ["his1","his99"] as his_values
OPTIONAL MATCH(h:HideInStore) where h.hide_in_store_id in his_values 
WITH c,s,p,h 
WHERE NONE (x in nodes (p) where ((x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h)) )
RETURN DISTINCT p

here's the case: query works fine if it can find at least one path to return. but when there's no path to return, it will return all avaiable paths. so

it will return all paths, if it cannot find any node in any of the paths related to HideInStore entities (as expected)
it will return expected path if it can find atleast one path to return (as expected)
it will return all paths (instead of none) if there's no path to return (not expected)

I'd appreciate your review on my query. thanks.

andrew_bowman · ‎08-12-2019

Ah, I think I understand the problem.

I think you're expecting a list member check instead of an exact node check when checking if a node in the path has a path to h.

You did an OPTIONAL MATCH to h which resulted in two nodes...that multiplies out against your past matches, so for each path you matched previously, you'll get a row for that path where h = his1, and another row for that path where h = his2. In your pattern, your intent is that you don't want any node in the path to have a path matching the pattern to any h, but that's not what your query is doing. It's evaluating it per h. h does not represent the collection of all nodes that matched that pattern. It represents a specific matched node per row.

So let's look at your paths.

Path1 goes through scat-a1 at the top. After your OPTIONAL MATCH there are now two rows for that path, one for h = his1 and the other for h = his2. During evaluation, looking at the row for h = his1, the predicate fails, because indeed there is a pattern match from one of those path nodes (scat-a1) to his1. So that row gets filtered out. But there's still the row with the same path where h = his2. And none of the nodes in the path has a relationship to his2, so the path is left unfiltered.

The opposite happens for the two rows going through scat-a2. The row where h = his2 gets filtered out, but the row where h = his1 remains.

This is something you need to keep in mind...variables for nodes in your query represent a single node per row. What you did with that OPTIONAL MATCH was basically a cross product of the results of that OPTIONAL MATCH with the previous results.

To accomplish what you really want, you need to collect the nodes to exclude, and use that collection in your predicate. And it would actually be easier to collect() not the :HideInStore nodes, but the :SalesCategory nodes that connect to them, that way your none() predicate only has to ensure that none of the nodes in the path is in the excluded nodes collection. Try this:

OPTIONAL MATCH (h:HideInStore)<-[:HAS_CATEGORY_HIDE_IN_STORE]-(x:SalesCategory)
WHERE h.hide_in_store_id in ["his1","his99"]
WITH collect(DISTINCT x) as excluded
MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
WITH c,s,p
WHERE none(x IN nodes(p) WHERE x IN excluded)
RETURN DISTINCT p

View solution in original post

andrew_bowman · ‎08-09-2019

That's a strange thing. Could we get, after the last WITH clause, the result of:

RETURN size([x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h)]),  
 NONE (x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h) )

nima · ‎08-10-2019

Hi Andrew, & thanks for the response.

here's the modification on the query & it's corresponding response:

Query

MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
WITH c,s,p, ["his2","his1"] as his_values
OPTIONAL MATCH(h:HideInStore) where h.hide_in_store_id in his_values 
WITH c,s,p,h 
RETURN size([x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h)]),  
 NONE (x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h) )

Response

size([x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h)]),NONE (x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h) )
1,false
0,true
0,true
1,false

also, I though it might be better to have an overview of how the simplified version of this problem looks like:

as I mentioned, when there's (at least) one path to return, the result is fine, but when we filter on {"his1","his2"} (meaning that there's no path to return), then the query returns all paths.

andrew_bowman · ‎08-12-2019

Ah, I think I understand the problem.

I think you're expecting a list member check instead of an exact node check when checking if a node in the path has a path to h.

You did an OPTIONAL MATCH to h which resulted in two nodes...that multiplies out against your past matches, so for each path you matched previously, you'll get a row for that path where h = his1, and another row for that path where h = his2. In your pattern, your intent is that you don't want any node in the path to have a path matching the pattern to any h, but that's not what your query is doing. It's evaluating it per h. h does not represent the collection of all nodes that matched that pattern. It represents a specific matched node per row.

So let's look at your paths.

Path1 goes through scat-a1 at the top. After your OPTIONAL MATCH there are now two rows for that path, one for h = his1 and the other for h = his2. During evaluation, looking at the row for h = his1, the predicate fails, because indeed there is a pattern match from one of those path nodes (scat-a1) to his1. So that row gets filtered out. But there's still the row with the same path where h = his2. And none of the nodes in the path has a relationship to his2, so the path is left unfiltered.

The opposite happens for the two rows going through scat-a2. The row where h = his2 gets filtered out, but the row where h = his1 remains.

This is something you need to keep in mind...variables for nodes in your query represent a single node per row. What you did with that OPTIONAL MATCH was basically a cross product of the results of that OPTIONAL MATCH with the previous results.

To accomplish what you really want, you need to collect the nodes to exclude, and use that collection in your predicate. And it would actually be easier to collect() not the :HideInStore nodes, but the :SalesCategory nodes that connect to them, that way your none() predicate only has to ensure that none of the nodes in the path is in the excluded nodes collection. Try this:

OPTIONAL MATCH (h:HideInStore)<-[:HAS_CATEGORY_HIDE_IN_STORE]-(x:SalesCategory)
WHERE h.hide_in_store_id in ["his1","his99"]
WITH collect(DISTINCT x) as excluded
MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
WITH c,s,p
WHERE none(x IN nodes(p) WHERE x IN excluded)
RETURN DISTINCT p

nima · ‎08-12-2019

Thanks Andrew for the help

nima · ‎08-19-2019

so, that solves the previous issue. but it will cause performance issues when we're facing lots of nodes (SalesCategory nodes)

just consider lots of SalesCategories (say, millions) with a relation to one of these hide_in_store values. in this case we are collecting & passing lots of nodes to the second query inorder to exclude them in the predicate. in fact, we're collecting lots of nodes, pass them to predicate to check if a small collection of nodes (say, 10 or 20) in a path contains any of them. this seems not to be the most efficient way & thus, I faced timeouts trying to use this approach in real data. Is there any other way to apply this exclusion/inclusion check without having to collect all nodes & passing them to the path predicate ?

Thanks in advance.

nima · ‎08-20-2019

what about this one: ?

WITH ["his1","his2"] as his_values
OPTIONAL MATCH(h:HideInStore) where h.hide_in_store_id in his_values 

WITH COLLECT(DISTINCT h) as excluded
MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
WHERE ALL (x in nodes (p) where NONE(h in excluded WHERE ((x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h))  ))
RETURN DISTINCT p

andrew_bowman · ‎08-20-2019

Yes, I think that will work.

Neo4j

Query returns all (instead of none) when NONE predicate matches all cases

Query

Response