cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Categorizing Nodes based on their weight compared to peers

Hi: I do have the following scenario:


The challenge is to add a property to each Fruit with one of this options ('TOP','MIDDLE','BOTTOM'). So it would be 'TOP' if they are in the top 30% with the highest weights, ´MIDDLE' in the top 70% highest weights, and 'BOTTOM' for the rest.
So I generate my sample grapgh with this code:
// clear data
MATCH (n)
DETACH DELETE n;

//Create Sample Basket
CREATE (:Fruit {Name: "Pear"})<-[:INCLUDES {Weight: 0.05}]-(b:Basket)-[:INCLUDES {Weight: 0.25}]->(:Fruit {Name: "Apple"}),(:Fruit {Name: "Ananas"})<-[:INCLUDES {Weight: 0.15}]-(b)-[:INCLUDES {Weight: 0.30}]->(:Fruit {Name: "Peach"}),(b)-[:INCLUDES {Weight: 0.25}]->(:Fruit {Name: "Banana"})

And then I Bring the fruits in the basket ordered by they weight percentage over the basket total.
MATCH (b:Basket)-[i:INCLUDES]-(f:Fruit)
Return f.Name as FruitName, i.Weight as FruitPercent ORDER BY FruitPercent DESC
and I get:

FruitName FruitPercent
"Peach" 0.3
"Apple" 0.25
"Banana" 0.25
"Ananas" 0.15
"Pear" 0.05

What I do need to produce is this instead:

FruitName FruitPercent FruitPercentAccumulated Category
"Peach" 0.3 0.3 'TOP'
"Apple" 0.25 0.55 'MIDDLE'
"Banana" 0.25 0.7 ´MIDDLE'
"Ananas" 0.15 0.85 'BOTTOM'
"Pear" 0.05 1.00 'BOTTOM'

I have tried many alternatives to generate the accumulated percentage but did not found any that worked, since it requieres to use a calculated variable over the first list. At the end I also want to insert back the atribute Category in every Fruit node with the result value.

The problem I am exposing is a very simplificated example that illustrates what I am really trying to solve which is the ABCFMR Analisys in Logistics.

.

Thanks in advance for any help!!

15 REPLIES 15

Nice challenge I like it, however I'm not sure if it's optimised for Neo4j.

I didn't figure out the whole thing yet but this part is a good start:

WITH [0.3,0.25,0.25,0.15,0.05] AS FruitPercents
RETURN [i IN range(1,size(FruitPercents)) | reduce(sum = 0.0, x IN FruitPercents[0..i] | sum + x)] AS FruitPercentAccumulated

The returns values are not precise and even wrong sometimes, I don't know what is going on here, maybe a Neo4j staff member can help us.

Thanks Gabriel!!! a good tip, I am able now to get the accumulated percentage. The rounding errors while surprising are not that relevant at this point.
Now the challenge is to maintain the percentages linked to the correspondent relationships and being able to build this output:

FruitName FruitPercent FruitPercentAccumulated Category
"Peach" 0.3 0.3 'TOP'
"Apple" 0.25 0.55 'MIDDLE'
"Banana" 0.25 0.7 ´MIDDLE'
"Ananas" 0.15 0.85 'BOTTOM'
"Pear" 0.05 1.00 'BOTTOM'

It is being really hard to code something that would be so easy in other languages with loop, local and global variable, etc.

Hi again @rfredesveas

Can I have the Cypher query you used for this result, I might be able to write back these results in the corresponding relationships.

But I don't understand what are you actually trying to do here?
The accumulated result doesn't make sens on the relationship as it's not a coherent part of the graph.

Thanks you so much Gabriel for your help. I do not have that Cypher, that is the result I want to produce.
So these are the steps I tried to implement.

  1. Generate a list of all the relationships from basket to fruits with their relative weights in the basket.
  2. Sort that list from the highest to the smallest weight.
  3. Generate a new property with the accumulated weight.
  4. The top 30% most relevant fruits in the baskets get a 'TOP' category, the following 40% get a 'MIDDLE' and the bottom 30 get a 'BOTTOM'
  5. Add a property to each fruit with the category they obtained.

My code so far for this example is the following:
// clear data

MATCH (n)

DETACH DELETE n;

//Create Sample Basket

CREATE (:Fruit {Name: "Pear"})<-[:INCLUDES {Weight: 0.05}]-(b:Basket)-[:INCLUDES {Weight: 0.25}]->(:Fruit {Name: "Apple"}),

(:Fruit {Name: "Ananas"})<-[:INCLUDES {Weight: 0.15}]-(b)-[:INCLUDES {Weight: 0.30}]->(:Fruit {Name: "Peach"}),

(b)-[:INCLUDES {Weight: 0.25}]->(:Fruit {Name: "Banana"})

//Bring the fruits in the basket ordered by they weith percentege over the total

    MATCH path= (b:Basket)-[incl:INCLUDES]-(f:Fruit) 

    UNWIND relationships(path) as inc

    with collect(toFloat(inc.Weight)) as FruitPercents

    RETURN [i IN range(1,size(FruitPercents)) | reduce(sum = 0.0, x IN FruitPercents[0..i] | sum + x)] AS FruitPercentAccumulated

I hope this clarifies better the challenge.
Saludos y muchas gracias!!
Rodrigo

Yep, after consideration, I still think that your request is something like a post Neo4j request result's treatment. That's why it's so not natural to do it with Neo4j.

But your problem makes me realise that the UNWIND clause sucks a bit since it's not possible to unwind more than one list as columns of the same row. Maybe it's a part of the principles of the data transfer between procedures behind the Cypher query language but it's definitely a limitation and I didn't see any work around it for now.

So I would call for help here by a Neo4j staff, god or a lucky community member with the good idea at the right moment lol

Well. Thanks you Gabriel for the efforts!!!
Do you have any idea on how I can get access to neo4j experts?

Try this:

match (b:Basket)-[r]-(a)
with collect(distinct id(b)) as b1, collect(distinct id(a)) as a1, collect(r.Weight) as wt
with apoc.coll.sort(wt) as wt1, b1, a1
//with toFloat("0.3") as r1, wt1

//using the percentile calculation like in graduate school......
//https://www.dummies.com/education/math/statistics/how-to-calculate-percentiles-in-statistics/

with round(toFloat("0.3") * size(wt1)) as t1, round(toFloat("0.7") * size(wt1)) as t2, wt1, b1, a1
with b1, a1, wt1, toInteger(t1) as t11, toInteger(t2) as t22

MATCH (b2)-[r1]-(a2)
where id(b2) in b1 and id(a2) in a1
with a2.Name as name, r1.Weight as weight,
toFloat(wt1[t11-1]) as q1, toFloat(wt1[t22-1]) as q2

with name, weight, q1, q2, 
CASE
   WHEN weight <= q1 THEN 'BOTTOM'
   WHEN q1 < weight <= q2 THEN 'MEDIUM'
   WHEN weight > q2 THEN 'TOP'
   ELSE weight END as level
   
   return name, weight, level order by weight  desc
 If you are using Neo4j version > 4.1 this query can be simplified. Am expecting this query to work in all versions.

Result:

They identify themself with the Neo4j Staff subtitle by their name in the community forum.
They are the most qualify to know the limits and possibilities of Neo4j.

You can click on their profile picture and message them personally about your topic.
Usually you can get some's attention with the @ operator but it works only with the active participant of a topic.

I figure out most of it but there is always this underlying loop behavior between each Neo4j clauses who fucked the end result. But I'm close.

@ameyasoft gave me an idee with the CASE clause.

I think I know exactly how to finish this _______ problem 🙂
With a short and sweet solution

SO paciently waiting for it

Finally did it

// Basket Beta
MATCH (:Basket)-[i:INCLUDES]->(f:Fruit)
WITH f.name AS FruitName, i.weight AS FruitPercent ORDER BY FruitPercent
WITH collect(FruitName) AS FruitNames, collect(FruitPercent) AS FruitPercents
WITH FruitNames, FruitPercents, [i IN range(1,size(FruitPercents)) | reduce(sum = 0.0, x IN FruitPercents[0..i] | sum + x)] AS Accumulation
UNWIND [i IN range(0,size(Accumulation) - 1) | [FruitNames[i], FruitPercents[i], Accumulation[i]]] AS row
RETURN row[0] AS FruitName, row[1] AS FruitPercent, row[2] AS FruitPercentAccumulated

Even if the multiple WITH clauses seems to make the query longer, it's actually just aggregation processing who are not doing db hits. Meaning the query performance is good or really good depending on how indexes are created and only if the data model stay small.

There is one column left to do, I don't have so much time tonight but I think it's now possible to do it with the CASE approach above. If you have anymore question about my solution let me know.

Thanks you so much Gabriel!! I was raching kind of the same result with the reduce sentence and then the Unwind.
But this helps a lot! very clear code. I'll be now integrating the whole and applying it to the real problem I am trying to solve. I'll let you know how it ends!

Muchas Gracias!

I would like to know how it ends at a larger scale.

Your welcome

The main problem here was to be able to unwind not 1 but 3 columns at the same time.
The idea behind the solution was to encode each element of the unwinded list as a row with columns.

I could have encoded my columns set as a map instead and having the same result.
The difference would be the way I decode my columns in the return clause.

Your problem was a pain in the ass to resolve for me but at the end it wasn't so hard to resolve. That's why I'm thinking about writing an article about Cypher.

If you don't mind, I would like to use your example in it. It's a really tricky one who force yourself to understand how Cypher works and using it beyond what you know. But it's a cute and easy to understand model.

I would even help you if u wish.

I am using this as part of a bigger problem and am still far to be done. But I would be glad to help.

bests