cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Cypher Syntax: Filtering Nodes in WHERE clause

I am trying to simplify a working query, but am getting unexpected results.

In my example graph, I have Users and Articles. The only relationship is (u:User) -[:HAS_READ]->(a:Article) . Users and Articles have Names.

I am trying to build a "weak tie" query that returns the Name of an article that has a given "distance" (measured by number of Users in the path) from a starting User. In a simple case of a "distance" of 1, an Article 2 is returned if User 1 has read Article 1, which User 2 had also read. User 2, in this case, has also read Article 2, which User 1 has not read.

I have a long query already working for distances of 2 and 3, but I want to try to simplify it.

Here is my latest query for distance=2 that does not return an error (but isn't working as expected):

MATCH  (u1:User {Name: "Bob"})-[:HAS_READ]->(a1:Article)<-[:HAS_READ]-(u2:User)-[:HAS_READ]->(a2:Article)<-[:HAS_READ]-(u3:User)-[:HAS_READ]->(a3:Article)
WHERE NOT (u1)-[:HAS_READ]-(a3) AND NOT (u2)-[:HAS_READ]-(a3)
RETURN DISTINCT a3.Name

The problem: depending on the path, a User can be in the list of u2 Users and u3 Users, so I am getting Articles that have been read by a u2 User.

I know how to do this with collect and WITH (using a very long-winded query), but is there a way to filter out Users who are in both u2 and u3 collections in the WHERE clause? When I try using collect in the WHERE clause, I get an error.

Here is an example query that returns an error:

MATCH  (u1:User {Name: "Bob"})-[:HAS_READ]->(a1:Article)<-[:HAS_READ]-(u2:User)-[:HAS_READ]->(a2:Article)<-[:HAS_READ]-(u3:User)-[:HAS_READ]->(a3:Article)
WHERE NOT (u1)-[:HAS_READ]-(a3) AND NONE (x in collect(u2) WHERE x in collect(u3))
RETURN DISTINCT a3.Name

The error: "Invalid use of aggregating function collect(...) in this context"

I am probably just missing some obvious Cypher concept or simple syntax (I hope), or I'm trying to simplify something which cannot be simplified...

Just for kicks, here is a working (long) query for distance = 2:

MATCH  (u1:User {Name: "Bob"})-[:HAS_READ]-(a1:Article)-[:HAS_READ]-(u2:User)-[:HAS_READ]-(a2:Article)
WHERE NOT (u1)-[:HAS_READ]-(a2)
WITH u1,collect(u2) as u2_list,a2
MATCH (a2)-[:HAS_READ]-(u3:User)-[:HAS_READ]-(a3:Article)
WHERE NOT (u1)-[:HAS_READ]-(a3)
WITH a2,a3,u2_list,collect(u3) as u3_list
MATCH (a2)-[:HAS_READ]-(u3)-[:HAS_READ]-(a3)
WHERE NONE (x in u3_list WHERE x in u2_list)
RETURN DISTINCT a3.Name
1 ACCEPTED SOLUTION

I created a sample graph that matches the model you described:

CREATE
  (`0` :User {name:'User1'}) ,
  (`1` :Article {title:'Article1'}) ,
  (`2` :User {name:'User2'}) ,
  (`3` :Article {title:'Article2'}) ,
  (`4` :User {name:'User3'}) ,
  (`5` :Article {title:'Article3'}) ,
  (`6` :User {name:'User4'}) ,
  (`0`)-[:`HAS_READ` ]->(`1`),
  (`2`)-[:`HAS_READ` ]->(`1`),
  (`2`)-[:`HAS_READ` ]->(`3`),
  (`4`)-[:`HAS_READ` ]->(`3`),
  (`4`)-[:`HAS_READ` ]->(`5`),
  (`6`)-[:`HAS_READ` ]->(`3`),
  (`6`)-[:`HAS_READ` ]->(`1`)

And then I think the following query gets the same results as your last one:

MATCH  (u1:User {name: "User1"})-[:HAS_READ]->(a1:Article)<-[:HAS_READ]-(u2:User)-[:HAS_READ]->(a2:Article)<-[:HAS_READ]-(u3:User)-[:HAS_READ]->(a3:Article)
WHERE not((u1)-[:HAS_READ]-(a3))
AND not((u3)-[:HAS_READ]->()<-[:HAS_READ]-(u1))
RETURN DISTINCT a3.title

The logic is different but I think it does the same thing. We're making sure that u3 doesn't have a distance=1 connection to u1 via that 2nd part of the WHERE clause.

View solution in original post

3 REPLIES 3

I created a sample graph that matches the model you described:

CREATE
  (`0` :User {name:'User1'}) ,
  (`1` :Article {title:'Article1'}) ,
  (`2` :User {name:'User2'}) ,
  (`3` :Article {title:'Article2'}) ,
  (`4` :User {name:'User3'}) ,
  (`5` :Article {title:'Article3'}) ,
  (`6` :User {name:'User4'}) ,
  (`0`)-[:`HAS_READ` ]->(`1`),
  (`2`)-[:`HAS_READ` ]->(`1`),
  (`2`)-[:`HAS_READ` ]->(`3`),
  (`4`)-[:`HAS_READ` ]->(`3`),
  (`4`)-[:`HAS_READ` ]->(`5`),
  (`6`)-[:`HAS_READ` ]->(`3`),
  (`6`)-[:`HAS_READ` ]->(`1`)

And then I think the following query gets the same results as your last one:

MATCH  (u1:User {name: "User1"})-[:HAS_READ]->(a1:Article)<-[:HAS_READ]-(u2:User)-[:HAS_READ]->(a2:Article)<-[:HAS_READ]-(u3:User)-[:HAS_READ]->(a3:Article)
WHERE not((u1)-[:HAS_READ]-(a3))
AND not((u3)-[:HAS_READ]->()<-[:HAS_READ]-(u1))
RETURN DISTINCT a3.title

The logic is different but I think it does the same thing. We're making sure that u3 doesn't have a distance=1 connection to u1 via that 2nd part of the WHERE clause.

Thanks Mark! This definitely points me in the right direction. There are some additional subtleties, such as an Article of distance = 2 that is also read by a user of distance =1, but I can play around with it. Thanks for the tip on the latter part of the WHERE clause. That's definitely useful syntax.

Just a quick minor thing to note, NOT is a keyword, not a function, so try to avoid using not(). Prefer something like this instead:

WHERE NOT (u1)-[:HAS_READ]-(a3)

You can of course use parenthesis for doing grouping of boolean expressions when needed, which can look similar, but just wanted clarify the syntax.