Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-31-2019 08:54 AM
Suppose I have the following data:
field1, field2
1,1
1,4
1,3
2,5
2,2
3,1
3,8
3,9
3,11
4,4
How can I write a cypher query to return the node that has the biggest value of field2
for a given field1
. i.e.
1,4
2,5
3,11
4,4
My attempts so far have resulted in the following (which doesn't work):
MATCH(a:A)
WITH COLLECT({field1: a.field1, field2: MAX(a.field2)}) as rowInfos
UNWIND rowInfos as rowInfo
WITH rowInfo.field1 as field1, rowInfo.field2 as maxField2
MATCH (a)
WHERE a.field1 = field1 AND a.field2 = maxField2
return a
If these nodes were rows in a SQL database then the following query would be used:
SELECT
a.* from A
INNER JOIN (
SELECT
field1, max(field2) as field2
FROM A
GROUP BY
field1
) ut
ON
a.field1 = ut.field1
AND a.field2 = ut.field2
07-31-2019 02:12 PM
Here is a solution,
MATCH (f:Fld1)-->(g:Fld2)
RETURN f.f1 as Fld1, max(g.f2) as Fld2;
Result:
MATCH (f:Fld1)-->(g:Fld2)
WITH f.f1 as Fld1, max(g.f2) as Fld2
MATCH (f1:Fld1)-->(g1:Fld2)
WHERE g1.f2 IN [Fld2]
RETURN f1, g1;
Result:
08-01-2019 08:58 AM
Turns out the query to achieve this is rather simple:
MATCH(b:A) WITH b.field1 as field1, MAX(b.field2) as field2 WHERE b.field1 = field1 MATCH (b) WHERE b.field1 = field1 AND b.field2 = field2 RETURN b
08-01-2019 10:09 AM
You don't need that first WHERE clause, the aggregation you're doing on line 2 is enough to get your groupings correct.
I've got a new aggregating APOC proc committed which should be able to more concisely handle a case like this, giving you all items that have the max (or min) of a certain value without having to do a rematch as in this query. That should go out with the next APOC release.
10-09-2020 01:06 PM
Hi,
How about a slight modification where the key value is in the relationship.
(A:Node1)-[r:by]-(c:NodeB)
Goal: for each A return B where r.key is maximum for the grouping of A to multiple B nodes.
Also which of the APOC procedures would avoid needing to do the rematch?
Andy
10-09-2020 03:57 PM
The function is apoc.agg.maxItems()
(there's a minItems()
variant as well).
You can use CALL apoc.help('maxItems()')
to find the description and signature of the function. If it doesn't return, then your version of APOC doesn't contain it, and you might want to look for the latest compatible upgrade for your Neo4j version.
Usage would be like:
MATCH (A:Node1)-[r:by]-(c:NodeB)
WITH A, apoc.agg.maxItems(c, r.key) as result
RETURN A, result.value as max, result.items as topBNodes
In the params, note that we're treating the item (c) separate from the value for which we're calculating the max (r.key). In the resulting map, we can extract the value (this is the maximum value) and the items are the list of all the items with that max value (in case there were ties). If you just need 1, then you can use result.items[0]
or head(result.items)
to get the first one in the list.
10-09-2020 04:50 PM
Perfect,
Thank you very much.
Just for future reference for other newbies: In the
apoc.agg.maxItems(c, r.key) as result
Instead of just 'c' I used c.term where term is a property key that I am interested. Using just 'c' returned the whole node description and I am just interested in a property key value.
Andy
10-09-2020 06:49 PM
That's fine, but it will be more efficient to keep the function parameters as-is (keep using the variable for just the node) and wait to project the property in your RETURN, since you only need to return the property for the top item. During query execution, a node variable is a very lightweight structure, it will only access properties when you project them.
All the sessions of the conference are now available online