cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

MAX aggregation never quite seems to behave as I expect

pdrangeid
Graph Voyager

If I'm ONLY looking for a max value from a single node query, then max behaves as I expect. But as soon as I want to query some relationships, max seems to do nothing, and when I attempt to apply a LIMIT or UNWIND I end up only getting a portion of what I'm looking for.

Let me explain my query. I want to find all contracts that are still active, but are expired (their end-date is greater than today).
Then, for ALL those contracts, I'd like to find any that have had active tickets in the last year. If they have, give me the latest (LAST) date that a new ticket was created, and return all that together.

I am very close, but I seem to only be able to return EVERY ticket that is in the last year against said contracts. thus my challenge understanding aggregation with MAX().

Here's my (not quite working) Cypher:

MATCH (cc:Crmcontract) where cc.enddate < timestamp() and (cc)--(:Crmcontractstatus {value:'A'})
WITH *,apoc.date.format(coalesce(cc.enddate,0),'ms',"yyyy-MM-dd'T'HH:mm:ss",'CST') as enddate,timestamp()-31104000000 as ayearago
MATCH (cc)--(t:Ticket) where t.createdon > ayearago
WITH *,max(t.createdon) as maxticket
WITH * where t.createdon = maxticket
WITH *,apoc.date.format(maxticket,'ms',"yyyy-MM-dd'T'HH:mm:ss",'CST') as lastticket
return cc.code,cc.name,lastticket,cc.enddate
2 REPLIES 2

You'll need to fully understand aggregation functions for this.

The key thing you are missing is that aggregation functions, like max(), only have meaning with respect to their grouping key, which is the set of non-aggregation variables in scope at the point of the aggregation. Your use of WITH * makes it harder to see at a glance what variables are in scope, and thus harder to determine the variables that make up your grouping key.

Looking at the previous lines of your query, the following variables are in scope and form the grouping key:
cc, enddate, ayearago, t

The problematic variable in this case is t. In english, this part of the query where you aggregate is asking "for the maximum value of t's createdon property for each single node t". In other words, for each particular t node (you have a single t node per row) you're getting the max value of that single t node's property, not the max value considering all possible t nodes.

In order to get the max() value across all t nodes, you either need to not have t in scope at all, or you need to aggregate t such that you get a collection of t nodes, and remove t as part of the grouping key (since the grouping key only consists of non-aggregation variables).

You may want to not use WITH * in your queries, especially when you need to aggregate like this, as it can throw you off, and there's no way to use * to say "WITH everything except t".

So here are some alternate queries that may do what you want (assuming your grouping is what you want).

MATCH (cc:Crmcontract) where cc.enddate < timestamp() and (cc)--(:Crmcontractstatus {value:'A'})
WITH *,apoc.date.format(coalesce(cc.enddate,0),'ms',"yyyy-MM-dd'T'HH:mm:ss",'CST') as enddate,timestamp()-31104000000 as ayearago
MATCH (cc)--(t:Ticket) where t.createdon > ayearago
WITH cc, enddate, max(t.createdon) as maxticket
WITH *,apoc.date.format(maxticket,'ms',"yyyy-MM-dd'T'HH:mm:ss",'CST') as lastticket
return cc.code,cc.name,lastticket,cc.enddate

I changed this such that max() is being applied per cc and enddate. If that's not what you need, please clarify and we can continue to work on this one.

pdrangeid
Graph Voyager

That was an excellent education! That is exactly what I needed, and think I have a better grasp on how to execute aggregation with max().

Thanks Andrew!