cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Query for count of nodes using boolean expression to determine membership in other nodes

I am trying to see if the following scenario is possible with Neo4j. I have many "user" nodes and many "segment" nodes in a many to many relationship. Each "user" is a member of any number of "segments". The primary keys of the "users" is "userId" and for the "segments" it is "segmentId". I want to query neo4j to find the number of users who satisfy a boolean expression about their segment membership. For example, the number of users who are members of (("123" OR "456") AND not "789"). Since the segmentId is numerical, I'm trying to add unique labels to my segment nodes and query using a relationship expression, but it seems to only evaluate my boolean expression at each node itself but not as an aggregate? Is what I am trying to do possible or recommended?

 

Example query I am trying to run:

MATCH (n:User) -[:IS_MEMBER]-> (s: (`1``  | `2` ) & ! `3` )
return count(*);

 

1 ACCEPTED SOLUTION

I think you got it right.  I switched the order of the boolean expressions, as I figured the 'not' condition would be true more often, so the rest of the expression does not need to be evaluated once the 'not' condition is determined to be true.  I also included the 'exists' clause, as this is the new syntax. 

I don't it hurts, but I don't think the 'distinct' is necessary, as you can only get one match for each node with this query. A scenario were you could get multiple values for the same 'n' would be if you were matching a path pattern and multiple paths where found for a given 'n'. 

MATCH (n:User)
where not exists((n)-[:IS_MEMBER]->({segmentId: 3})) or (exists((n)-[:IS_MEMBER]->({segmentId: 1})) AND exists((n)-[:IS_MEMBER]->({segmentId: 2}))) 
return count(*);

 

View solution in original post

4 REPLIES 4

You don’t use primary keys and foreign keys to relate nodes as in a relational database.  You use relationships to directly link nodes together.  

you can have the equivalent of a many-to-many relationships between user and segment nodes but just relating them to each other.  There is no need for a joint table.  You can use relationship properties to store values that are related to the relationship, as you would with values in a join table. 

I would not use labels to identify segments, but use a property on ‘Segment’ nodes to uniquely identify each segment.  Then you relay users to each segment node they are a member of.  You can easily query for all members that are members of the segments specified using a Boolean expression.  You can get the count of the users returned. 

Thanks for the quick response! ok, yeah I agree the usage of segment ids as labels did not seem like a good idea. For the boolean expression above, what would the query look like above? Should I be using a relationship expression to do this or stick with node label expressions?

I came up with this query below: 

 
MATCH (n:User)
where ((n)-[:IS_MEMBER]->({segmentId: 1}) AND (n)-[:IS_MEMBER]->({segmentId: 2})) OR (NOT (n)-[:IS_MEMBER]->({segmentId: 3}))
return count(DISTINCT n);
 
Is this an optimal way to query? I'm hoping this will scale for my dataset which is billions of users and tens of thousands of segments.

I think you got it right.  I switched the order of the boolean expressions, as I figured the 'not' condition would be true more often, so the rest of the expression does not need to be evaluated once the 'not' condition is determined to be true.  I also included the 'exists' clause, as this is the new syntax. 

I don't it hurts, but I don't think the 'distinct' is necessary, as you can only get one match for each node with this query. A scenario were you could get multiple values for the same 'n' would be if you were matching a path pattern and multiple paths where found for a given 'n'. 

MATCH (n:User)
where not exists((n)-[:IS_MEMBER]->({segmentId: 3})) or (exists((n)-[:IS_MEMBER]->({segmentId: 1})) AND exists((n)-[:IS_MEMBER]->({segmentId: 2}))) 
return count(*);

 

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online