Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-09-2019 02:49 AM
Hi
consider the following query which returns 2 different paths (Starting with Catalog:catalog-1
to SalesCategory:scat-o
) as expected
MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
RETURN DISTINCT p
now, we want to add some extra logic to filter the returned paths: nodes of the returned paths should not have any relation with some other nodes (we call it HideInStore nodes), here's the query:
MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
WITH c,s,p, ["his1","his99"] as his_values
OPTIONAL MATCH(h:HideInStore) where h.hide_in_store_id in his_values
WITH c,s,p,h
WHERE NONE (x in nodes (p) where ((x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h)) )
RETURN DISTINCT p
here's the case: query works fine if it can find at least one path to return. but when there's no path to return, it will return all avaiable paths. so
I'd appreciate your review on my query. thanks.
Solved! Go to Solution.
08-12-2019 01:42 AM
Ah, I think I understand the problem.
I think you're expecting a list member check instead of an exact node check when checking if a node in the path has a path to h
.
You did an OPTIONAL MATCH to h
which resulted in two nodes...that multiplies out against your past matches, so for each path you matched previously, you'll get a row for that path where h
= his1, and another row for that path where h
= his2. In your pattern, your intent is that you don't want any node in the path to have a path matching the pattern to any h
, but that's not what your query is doing. It's evaluating it per h
. h
does not represent the collection of all nodes that matched that pattern. It represents a specific matched node per row.
So let's look at your paths.
Path1 goes through scat-a1
at the top. After your OPTIONAL MATCH there are now two rows for that path, one for h = his1 and the other for h = his2. During evaluation, looking at the row for h = his1, the predicate fails, because indeed there is a pattern match from one of those path nodes (scat-a1) to his1. So that row gets filtered out. But there's still the row with the same path where h = his2. And none of the nodes in the path has a relationship to his2, so the path is left unfiltered.
The opposite happens for the two rows going through scat-a2. The row where h = his2 gets filtered out, but the row where h = his1 remains.
This is something you need to keep in mind...variables for nodes in your query represent a single node per row. What you did with that OPTIONAL MATCH was basically a cross product of the results of that OPTIONAL MATCH with the previous results.
To accomplish what you really want, you need to collect the nodes to exclude, and use that collection in your predicate. And it would actually be easier to collect() not the :HideInStore nodes, but the :SalesCategory nodes that connect to them, that way your none()
predicate only has to ensure that none of the nodes in the path is in the excluded nodes collection. Try this:
OPTIONAL MATCH (h:HideInStore)<-[:HAS_CATEGORY_HIDE_IN_STORE]-(x:SalesCategory)
WHERE h.hide_in_store_id in ["his1","his99"]
WITH collect(DISTINCT x) as excluded
MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
WITH c,s,p
WHERE none(x IN nodes(p) WHERE x IN excluded)
RETURN DISTINCT p
08-09-2019 12:29 PM
That's a strange thing. Could we get, after the last WITH clause, the result of:
RETURN size([x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h)]),
NONE (x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h) )
08-10-2019 01:16 AM
Hi Andrew, & thanks for the response.
here's the modification on the query & it's corresponding response:
MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
WITH c,s,p, ["his2","his1"] as his_values
OPTIONAL MATCH(h:HideInStore) where h.hide_in_store_id in his_values
WITH c,s,p,h
RETURN size([x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h)]),
NONE (x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h) )
size([x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h)]),NONE (x in nodes (p) where (x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h) )
1,false
0,true
0,true
1,false
also, I though it might be better to have an overview of how the simplified version of this problem looks like:
as I mentioned, when there's (at least) one path to return, the result is fine, but when we filter on {"his1","his2"} (meaning that there's no path to return), then the query returns all paths.
08-12-2019 01:42 AM
Ah, I think I understand the problem.
I think you're expecting a list member check instead of an exact node check when checking if a node in the path has a path to h
.
You did an OPTIONAL MATCH to h
which resulted in two nodes...that multiplies out against your past matches, so for each path you matched previously, you'll get a row for that path where h
= his1, and another row for that path where h
= his2. In your pattern, your intent is that you don't want any node in the path to have a path matching the pattern to any h
, but that's not what your query is doing. It's evaluating it per h
. h
does not represent the collection of all nodes that matched that pattern. It represents a specific matched node per row.
So let's look at your paths.
Path1 goes through scat-a1
at the top. After your OPTIONAL MATCH there are now two rows for that path, one for h = his1 and the other for h = his2. During evaluation, looking at the row for h = his1, the predicate fails, because indeed there is a pattern match from one of those path nodes (scat-a1) to his1. So that row gets filtered out. But there's still the row with the same path where h = his2. And none of the nodes in the path has a relationship to his2, so the path is left unfiltered.
The opposite happens for the two rows going through scat-a2. The row where h = his2 gets filtered out, but the row where h = his1 remains.
This is something you need to keep in mind...variables for nodes in your query represent a single node per row. What you did with that OPTIONAL MATCH was basically a cross product of the results of that OPTIONAL MATCH with the previous results.
To accomplish what you really want, you need to collect the nodes to exclude, and use that collection in your predicate. And it would actually be easier to collect() not the :HideInStore nodes, but the :SalesCategory nodes that connect to them, that way your none()
predicate only has to ensure that none of the nodes in the path is in the excluded nodes collection. Try this:
OPTIONAL MATCH (h:HideInStore)<-[:HAS_CATEGORY_HIDE_IN_STORE]-(x:SalesCategory)
WHERE h.hide_in_store_id in ["his1","his99"]
WITH collect(DISTINCT x) as excluded
MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
WITH c,s,p
WHERE none(x IN nodes(p) WHERE x IN excluded)
RETURN DISTINCT p
08-12-2019 02:01 AM
Thanks Andrew for the help
08-19-2019 06:42 AM
so, that solves the previous issue. but it will cause performance issues when we're facing lots of nodes (SalesCategory nodes)
just consider lots of SalesCategories (say, millions) with a relation to one of these hide_in_store values. in this case we are collecting & passing lots of nodes to the second query inorder to exclude them in the predicate. in fact, we're collecting lots of nodes, pass them to predicate to check if a small collection of nodes (say, 10 or 20) in a path contains any of them. this seems not to be the most efficient way & thus, I faced timeouts trying to use this approach in real data. Is there any other way to apply this exclusion/inclusion check without having to collect all nodes & passing them to the path predicate ?
Thanks in advance.
08-20-2019 01:57 AM
what about this one: ?
WITH ["his1","his2"] as his_values
OPTIONAL MATCH(h:HideInStore) where h.hide_in_store_id in his_values
WITH COLLECT(DISTINCT h) as excluded
MATCH (c:Catalog { catalog_id: "catalog-1" }),(s:SalesCategory { sales_category_id: "scat-o" }), p = allShortestPaths((c)-[:HAS_CATEGORY|:HAS_SUBCATEGORY*]-(s))
WHERE ALL (x in nodes (p) where NONE(h in excluded WHERE ((x:SalesCategory)-[:HAS_CATEGORY_HIDE_IN_STORE]->(h)) ))
RETURN DISTINCT p
08-20-2019 08:51 AM
Yes, I think that will work.
All the sessions of the conference are now available online