cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Aggregate functions for paths

Hi all,

I've got a query that finds some paths through a Marking relationship, where the "Marking" relationship contains a property "dt_val" (datetime).

What I would like to do, is for each path, get the largest "dt_val" value in that path. I thought that would be done with the following query:

MATCH p = (s)-[r:Marking*]->(t)
RETURN p, max([rel in relationships(p)|rel.dt_val]) as highest_dt

However, this returns a list of all values of "dt_val" as opposed to the largest one. When I remove p, this returns a list of 4 items, but the number of paths is much higher than 4. I believe this is because the function is being applied in the wrong direction in this matrix; 4 in this case is the largest number of :Marking relationships that exists per path.

Is anyone able to spot why it isn't returning the maximum, based on the list comprehension, for each path?

Many thanks,
Nick

1 ACCEPTED SOLUTION

The problem is max() is an aggregation function, not a scaler function, it works across rows, not across values in a list.

Cypher doesn't currently have a scaler max() function, but APOC Procedures does in the form of apoc.coll.max(). Install APOC and try this:

MATCH p = (s)-[r:Marking*]->(t)
RETURN p, apoc.coll.max([rel in relationships(p) | rel.dt_val]) as highest_dt

Be aware that p will match to every path possible (including smaller segments of larger paths) that uses :Marking relationships. If you wanted to only evaluate paths where s is a start node (with no incoming :Marking rels) and t is an end node (with no outgoing :Marking rels), then you should add that restriction as a WHERE clause:

MATCH p = (s)-[r:Marking*]->(t)
WHERE NOT ()-[:Marking]->(s) AND NOT (t)-[:MARKING]->()
RETURN p, apoc.coll.max([rel in relationships(p) | rel.dt_val]) as highest_dt

View solution in original post

4 REPLIES 4

Should this not read:

max([rel in relationships(p)|rel.dt_val]) as highest_dt

Correct, thanks. Have updated

The problem is max() is an aggregation function, not a scaler function, it works across rows, not across values in a list.

Cypher doesn't currently have a scaler max() function, but APOC Procedures does in the form of apoc.coll.max(). Install APOC and try this:

MATCH p = (s)-[r:Marking*]->(t)
RETURN p, apoc.coll.max([rel in relationships(p) | rel.dt_val]) as highest_dt

Be aware that p will match to every path possible (including smaller segments of larger paths) that uses :Marking relationships. If you wanted to only evaluate paths where s is a start node (with no incoming :Marking rels) and t is an end node (with no outgoing :Marking rels), then you should add that restriction as a WHERE clause:

MATCH p = (s)-[r:Marking*]->(t)
WHERE NOT ()-[:Marking]->(s) AND NOT (t)-[:MARKING]->()
RETURN p, apoc.coll.max([rel in relationships(p) | rel.dt_val]) as highest_dt

Thanks Andrew, that's done the trick. My neo installation is on an offline machine, and the match query is quite large (and irrelevant in this case) so didn't include the bulk of it. Much appreciated!