cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

SOLVED: Iterating over a query based on multiple values of property X and summing the yielded result

dave
Node Clone

I know that there is no way of using MATCH, UNWIND or FILTER within a FOREACH loop, but do not what could solve the problem. I have the following query to start,

MATCH (u:User)-[s:SIMILAR]-(:User)-[t:TRIED]-(p:Product)

The TRIED has 10 properties with values 0-3 but the SIMILAR relationship has a property of s.strength which can be any integer value greater than 5

What I am trying to create is an average of averages using the following

sum((strength * (average of t1 at strength = 6) + (strength * (average of t1 at strength = 7) ..... )

example:
strength = 7
t1 = 1, 1, 1, 0
result = 5.75

the following grid is the desired average results from the first part of the equation

       s.strength         | strSum
	  |  7    |  8  |  9  |
t.t1  | 5.25  |  0  |  0  | 5.25
t.t2  | 19.25 |  16 |  27 | 62.25 
t.t3  |  21   |  24 |  27 | 72

I then want to take the strSum value and divide it

sum((strength * (average of t1 at strength = 6) + (strength * (average of t1 at strength = 7) ..... ) / sum(distinct s.strength)

to give me an average of averages weighted by s.strength.

Does anyone have a way to iterate through all the values of s.strength and return these weighted averages or is there an APOC proceedure that would help with this. I have been looking at the periodic.rock_n_roll_while bit am not quite sure if this is right.

Below is some test data to create and setup the graph that would give the above grid results

CREATE (u:User {ID: 'A'}), (p:Product {ID: 'A'}) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'B'})-[t:TASTED {t1: 1, t2: 3, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'C'})-[t:TASTED {t1: 1, t2: 3, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'D'})-[t:TASTED {t1: 1, t2: 3, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'E'})-[t:TASTED {t1: 0, t2: 2, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 8}]-(:User {ID: 'F'})-[t:TASTED {t1: 1, t2: 2, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 8}]-(:User {ID: 'G'})-[t:TASTED {t1: 1, t2: 2, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 9}]-(:User {ID: 'H'})-[t:TASTED {t1: 0, t2: 3, t3: 3}]-(p)

EDIT: fix incorrect values in t1

1 ACCEPTED SOLUTION

dave
Node Clone

Found something by accident in a question on StackOverflow which solved this using

UNWIND RANGE(min,max)

And creating nodes for each strength with the averages.

For anyone reading this and looking for a solution, there is this.

MATCH 
    (u1:User {ID: 'A'})-[s:SIMILAR]-(:User)-[t:TASTED]-(p:Wine {ID: 'A'})
WITH u1, p,
    min(s.strength) as minS,
    max(s.strength) as maxS
WITH 
    u1, p, range(minS, maxS) AS range
UNWIND 
    range AS range2
WITH 
     distinct toInteger(range2) as i, u1, p
MATCH 
    (u1)-[s2:SIMILAR]-(:User)-[t2:TASTED]-(p) WHERE s2.strength = i
WITH 
    u1, p, i,
    COLLECT(toInteger(t2.taste1)) AS taste1,
    COLLECT(toInteger(t2.taste2)) AS taste2,
    COLLECT(toInteger(t2.taste3)) AS taste3
MERGE 
    (a:Average {user: u1._id, product: p._id, strength: i})
ON MATCH SET 
    a.taste1 = apoc.coll.avg(taste1) * i,
    a.taste2 = apoc.coll.avg(taste2) * i,
    a.taste3 = apoc.coll.avg(taste3) * i
ON CREATE SET 
    a.taste1 = apoc.coll.avg(taste1) * i,
    a.taste2 = apoc.coll.avg(taste2) * i,
    a.taste3 = apoc.coll.avg(taste3) * i
WITH u1, p
MATCH (a2:Average {user: u1._id, product: p._id})
WITH 
    u1, p,
    sum(a2.taste1)/sum(a2.strength) AS avg1, 
    sum(a2.taste2)/sum(a2.strength) AS avg2, 
    sum(a2.taste3)/sum(a2.strength) AS avg3
MERGE (u1)-[r:AVG_SCORE]->(p) 
    ON CREATE SET 
        r.taste1 = avg1, 
        r.taste2 = avg2, 
        r.taste3 = avg3
    ON MATCH SET 
        r.taste1 = avg1, 
        r.taste2 = avg2, 
        r.taste3 = avg3
RETURN r

If there is a more efficient method then great but this is a working solution for now.

View solution in original post

10 REPLIES 10

I'll see what I can do. It's important to note that there's a minor flaw in your create query, you need WITH u, p on the first line, to reuse the p node, otherwise a new blank node is created and used for p for the rest of the query.

I think I'm missing something about your calculations. Here's your example:

example:
strength = 7
t1 = 0, 0, 0, 1
result = 5.75

With a strength of 7 and the given t1 values, the formula doesn't seem to work out to your expected result:

(strength * (average of t1 at strength = 7))
= (7 * 0.25)
= 1.75

Can you clarify what calculation(s) you're using to produce the result in your example?

dave
Node Clone

Yes you are right, it was late at night. My spread sheet was setup with the

t1 as 1, 1, 1, 0

So yes in the example I gave, it is as you say 1.75

Sorry for the confusion

Please edit the original post to fix it.

dave
Node Clone

Edited.

I had tried last night to create a new (:Average) node inside of a FOREACH loop, and then I could just sum all the t1 values from the :Average nodes, in the following query, but cannot MATCH.

MATCH (u:User {ID: 'A'})-[s:SIMILAR]-(:User)-[t:TASTED]-(p:Product {ID: 'A'})
WITH s, t, p, u, COLLECT(t { .*, strength: s.strength}) AS tastings, COLLECT(distinct s.strengths) AS strengths, SUM(distinct s.strength) AS strengthSum
UNWIND tastings AS tasting
FOREACH (strength IN strengths |
    MATCH (tasting) WHERE tasting.strength = strength
	CREATE (n:Average {user: u._id, product: p._id}) SET 
    n.taste1 = apoc.coll.avg(tasting.t1),
    n.taste2 = apoc.coll.avg(tasting.t2),
    n.taste3 = apoc.coll.avg(tasting.t3))
WITH p, u
MATCH (a:Averge {user: u._id, product: p_id})
WITH sum(a.t1)/strengths AS avg1, sum(a.t2)/strengths AS avg2, sum(a.t3)/strengths AS avg3
MERGE (u)-[r:AVG_TASTE]-(p) ON CREATE SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3 ON MATCH SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3

So I tried the following

MATCH (u1:User {ID: 'A'})-[s:SIMILAR]-(u2:User)-[t:TASTED]-(p:Product {ID: 'A'})
WITH s, t, p, u1, u2, COLLECT(t { .*, strength: s.strength}) AS tasting, COLLECT(distinct s.strength) AS strengths, SUM(distinct s.strength) AS strengthSum
FOREACH (strength IN strengths |
	CREATE (n:Average {user: u1._id, product: p._id, strength: strength}) 
    )
WITH p, u1, u2, tasting, strengthSum
MATCH (a:Averge {user: u1._id, product: p._id})
MATCH (u2)-[t2:TASTED]-(p) WHERE t2.strength = a.strength
SET 
    a.taste1 = apoc.coll.avg(t2.t1),
    a.taste2 = apoc.coll.avg(t2.t2),
    a.taste3 = apoc.coll.avg(t2.t3)
WITH strengthSum, u1, p
MATCH (a2:Averge {user: u1._id, product: p._id})
WITH sum(a2.t1)/strengthSum AS avg1, sum(a2.t2)/strengthSum AS avg2, sum(a2.t3)/strengthSum AS avg3
MERGE (u1)-[r:AVG_SCORE]-(p) ON CREATE SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3 ON MATCH SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3
RETURN r

But the distinct s.strength does not gather distinct strength values and the query ends after the FOREACH loop.

dave
Node Clone

Found something by accident in a question on StackOverflow which solved this using

UNWIND RANGE(min,max)

And creating nodes for each strength with the averages.

For anyone reading this and looking for a solution, there is this.

MATCH 
    (u1:User {ID: 'A'})-[s:SIMILAR]-(:User)-[t:TASTED]-(p:Wine {ID: 'A'})
WITH u1, p,
    min(s.strength) as minS,
    max(s.strength) as maxS
WITH 
    u1, p, range(minS, maxS) AS range
UNWIND 
    range AS range2
WITH 
     distinct toInteger(range2) as i, u1, p
MATCH 
    (u1)-[s2:SIMILAR]-(:User)-[t2:TASTED]-(p) WHERE s2.strength = i
WITH 
    u1, p, i,
    COLLECT(toInteger(t2.taste1)) AS taste1,
    COLLECT(toInteger(t2.taste2)) AS taste2,
    COLLECT(toInteger(t2.taste3)) AS taste3
MERGE 
    (a:Average {user: u1._id, product: p._id, strength: i})
ON MATCH SET 
    a.taste1 = apoc.coll.avg(taste1) * i,
    a.taste2 = apoc.coll.avg(taste2) * i,
    a.taste3 = apoc.coll.avg(taste3) * i
ON CREATE SET 
    a.taste1 = apoc.coll.avg(taste1) * i,
    a.taste2 = apoc.coll.avg(taste2) * i,
    a.taste3 = apoc.coll.avg(taste3) * i
WITH u1, p
MATCH (a2:Average {user: u1._id, product: p._id})
WITH 
    u1, p,
    sum(a2.taste1)/sum(a2.strength) AS avg1, 
    sum(a2.taste2)/sum(a2.strength) AS avg2, 
    sum(a2.taste3)/sum(a2.strength) AS avg3
MERGE (u1)-[r:AVG_SCORE]->(p) 
    ON CREATE SET 
        r.taste1 = avg1, 
        r.taste2 = avg2, 
        r.taste3 = avg3
    ON MATCH SET 
        r.taste1 = avg1, 
        r.taste2 = avg2, 
        r.taste3 = avg3
RETURN r

If there is a more efficient method then great but this is a working solution for now.

Hey Dave, this should be easier:

MATCH (u:User)-[s:SIMILAR]->(:User)-[t:TASTED]->(p:Product) 
// turn key into a column
UNWIND keys(t) as k 
// group by strength and key
WITH u,p, s.strength AS str, k, avg(t[k]) AS tx 
RETURN u,p,sum(str*tx) AS weight

+--------------------------------------------+
| u               | p               | weight |
+--------------------------------------------+
| Node[0]{ID:"A"} | Node[1]{ID:"A"} | 147.5  |
+--------------------------------------------+

See also: http://console.neo4j.org/r/eb4xl2

OH!!! Thats rather elegant and has given me another idea for use case. Thank you @michael.hunger

I have just ran into a slight issue with that approach. I'm using the uuid module from Graphaware which throws in an error of avg() can only handle numerical values or null, is there a way to filter the keys that are only integer?

Have tried the below, but still get the numerical error

CASE t[k] WHEN toInteger(t[k]) THEN avg(t[k]) ELSE null END AS tx

this should work:

WHERE apoc.meta.isType(tx,'INTEGER')