Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-31-2018 04:29 PM
I know that there is no way of using MATCH, UNWIND or FILTER within a FOREACH loop, but do not what could solve the problem. I have the following query to start,
MATCH (u:User)-[s:SIMILAR]-(:User)-[t:TRIED]-(p:Product)
The TRIED has 10 properties with values 0-3 but the SIMILAR relationship has a property of s.strength which can be any integer value greater than 5
What I am trying to create is an average of averages using the following
sum((strength * (average of t1 at strength = 6) + (strength * (average of t1 at strength = 7) ..... )
example:
strength = 7
t1 = 1, 1, 1, 0
result = 5.75
the following grid is the desired average results from the first part of the equation
s.strength | strSum
| 7 | 8 | 9 |
t.t1 | 5.25 | 0 | 0 | 5.25
t.t2 | 19.25 | 16 | 27 | 62.25
t.t3 | 21 | 24 | 27 | 72
I then want to take the strSum value and divide it
sum((strength * (average of t1 at strength = 6) + (strength * (average of t1 at strength = 7) ..... ) / sum(distinct s.strength)
to give me an average of averages weighted by s.strength.
Does anyone have a way to iterate through all the values of s.strength and return these weighted averages or is there an APOC proceedure that would help with this. I have been looking at the periodic.rock_n_roll_while bit am not quite sure if this is right.
Below is some test data to create and setup the graph that would give the above grid results
CREATE (u:User {ID: 'A'}), (p:Product {ID: 'A'}) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'B'})-[t:TASTED {t1: 1, t2: 3, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'C'})-[t:TASTED {t1: 1, t2: 3, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'D'})-[t:TASTED {t1: 1, t2: 3, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 7}]-(:User {ID: 'E'})-[t:TASTED {t1: 0, t2: 2, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 8}]-(:User {ID: 'F'})-[t:TASTED {t1: 1, t2: 2, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 8}]-(:User {ID: 'G'})-[t:TASTED {t1: 1, t2: 2, t3: 3}]-(p) WITH u, p
MERGE (u)-[:SIMILAR {strength: 9}]-(:User {ID: 'H'})-[t:TASTED {t1: 0, t2: 3, t3: 3}]-(p)
EDIT: fix incorrect values in t1
Solved! Go to Solution.
09-01-2018 07:37 AM
Found something by accident in a question on StackOverflow which solved this using
UNWIND RANGE(min,max)
And creating nodes for each strength with the averages.
For anyone reading this and looking for a solution, there is this.
MATCH
(u1:User {ID: 'A'})-[s:SIMILAR]-(:User)-[t:TASTED]-(p:Wine {ID: 'A'})
WITH u1, p,
min(s.strength) as minS,
max(s.strength) as maxS
WITH
u1, p, range(minS, maxS) AS range
UNWIND
range AS range2
WITH
distinct toInteger(range2) as i, u1, p
MATCH
(u1)-[s2:SIMILAR]-(:User)-[t2:TASTED]-(p) WHERE s2.strength = i
WITH
u1, p, i,
COLLECT(toInteger(t2.taste1)) AS taste1,
COLLECT(toInteger(t2.taste2)) AS taste2,
COLLECT(toInteger(t2.taste3)) AS taste3
MERGE
(a:Average {user: u1._id, product: p._id, strength: i})
ON MATCH SET
a.taste1 = apoc.coll.avg(taste1) * i,
a.taste2 = apoc.coll.avg(taste2) * i,
a.taste3 = apoc.coll.avg(taste3) * i
ON CREATE SET
a.taste1 = apoc.coll.avg(taste1) * i,
a.taste2 = apoc.coll.avg(taste2) * i,
a.taste3 = apoc.coll.avg(taste3) * i
WITH u1, p
MATCH (a2:Average {user: u1._id, product: p._id})
WITH
u1, p,
sum(a2.taste1)/sum(a2.strength) AS avg1,
sum(a2.taste2)/sum(a2.strength) AS avg2,
sum(a2.taste3)/sum(a2.strength) AS avg3
MERGE (u1)-[r:AVG_SCORE]->(p)
ON CREATE SET
r.taste1 = avg1,
r.taste2 = avg2,
r.taste3 = avg3
ON MATCH SET
r.taste1 = avg1,
r.taste2 = avg2,
r.taste3 = avg3
RETURN r
If there is a more efficient method then great but this is a working solution for now.
08-31-2018 08:07 PM
I'll see what I can do. It's important to note that there's a minor flaw in your create query, you need WITH u, p
on the first line, to reuse the p
node, otherwise a new blank node is created and used for p
for the rest of the query.
08-31-2018 08:31 PM
I think I'm missing something about your calculations. Here's your example:
example:
strength = 7
t1 = 0, 0, 0, 1
result = 5.75
With a strength of 7 and the given t1 values, the formula doesn't seem to work out to your expected result:
(strength * (average of t1 at strength = 7))
= (7 * 0.25)
= 1.75
Can you clarify what calculation(s) you're using to produce the result in your example?
09-01-2018 12:04 AM
Yes you are right, it was late at night. My spread sheet was setup with the
t1 as 1, 1, 1, 0
So yes in the example I gave, it is as you say 1.75
Sorry for the confusion
09-01-2018 01:28 AM
Please edit the original post to fix it.
09-01-2018 02:13 AM
Edited.
I had tried last night to create a new (:Average) node inside of a FOREACH loop, and then I could just sum all the t1 values from the :Average nodes, in the following query, but cannot MATCH.
MATCH (u:User {ID: 'A'})-[s:SIMILAR]-(:User)-[t:TASTED]-(p:Product {ID: 'A'})
WITH s, t, p, u, COLLECT(t { .*, strength: s.strength}) AS tastings, COLLECT(distinct s.strengths) AS strengths, SUM(distinct s.strength) AS strengthSum
UNWIND tastings AS tasting
FOREACH (strength IN strengths |
MATCH (tasting) WHERE tasting.strength = strength
CREATE (n:Average {user: u._id, product: p._id}) SET
n.taste1 = apoc.coll.avg(tasting.t1),
n.taste2 = apoc.coll.avg(tasting.t2),
n.taste3 = apoc.coll.avg(tasting.t3))
WITH p, u
MATCH (a:Averge {user: u._id, product: p_id})
WITH sum(a.t1)/strengths AS avg1, sum(a.t2)/strengths AS avg2, sum(a.t3)/strengths AS avg3
MERGE (u)-[r:AVG_TASTE]-(p) ON CREATE SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3 ON MATCH SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3
So I tried the following
MATCH (u1:User {ID: 'A'})-[s:SIMILAR]-(u2:User)-[t:TASTED]-(p:Product {ID: 'A'})
WITH s, t, p, u1, u2, COLLECT(t { .*, strength: s.strength}) AS tasting, COLLECT(distinct s.strength) AS strengths, SUM(distinct s.strength) AS strengthSum
FOREACH (strength IN strengths |
CREATE (n:Average {user: u1._id, product: p._id, strength: strength})
)
WITH p, u1, u2, tasting, strengthSum
MATCH (a:Averge {user: u1._id, product: p._id})
MATCH (u2)-[t2:TASTED]-(p) WHERE t2.strength = a.strength
SET
a.taste1 = apoc.coll.avg(t2.t1),
a.taste2 = apoc.coll.avg(t2.t2),
a.taste3 = apoc.coll.avg(t2.t3)
WITH strengthSum, u1, p
MATCH (a2:Averge {user: u1._id, product: p._id})
WITH sum(a2.t1)/strengthSum AS avg1, sum(a2.t2)/strengthSum AS avg2, sum(a2.t3)/strengthSum AS avg3
MERGE (u1)-[r:AVG_SCORE]-(p) ON CREATE SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3 ON MATCH SET r.t1 = avg1, r.t2 = avg2, r.t3 = avg3
RETURN r
But the distinct s.strength does not gather distinct strength values and the query ends after the FOREACH loop.
09-01-2018 07:37 AM
Found something by accident in a question on StackOverflow which solved this using
UNWIND RANGE(min,max)
And creating nodes for each strength with the averages.
For anyone reading this and looking for a solution, there is this.
MATCH
(u1:User {ID: 'A'})-[s:SIMILAR]-(:User)-[t:TASTED]-(p:Wine {ID: 'A'})
WITH u1, p,
min(s.strength) as minS,
max(s.strength) as maxS
WITH
u1, p, range(minS, maxS) AS range
UNWIND
range AS range2
WITH
distinct toInteger(range2) as i, u1, p
MATCH
(u1)-[s2:SIMILAR]-(:User)-[t2:TASTED]-(p) WHERE s2.strength = i
WITH
u1, p, i,
COLLECT(toInteger(t2.taste1)) AS taste1,
COLLECT(toInteger(t2.taste2)) AS taste2,
COLLECT(toInteger(t2.taste3)) AS taste3
MERGE
(a:Average {user: u1._id, product: p._id, strength: i})
ON MATCH SET
a.taste1 = apoc.coll.avg(taste1) * i,
a.taste2 = apoc.coll.avg(taste2) * i,
a.taste3 = apoc.coll.avg(taste3) * i
ON CREATE SET
a.taste1 = apoc.coll.avg(taste1) * i,
a.taste2 = apoc.coll.avg(taste2) * i,
a.taste3 = apoc.coll.avg(taste3) * i
WITH u1, p
MATCH (a2:Average {user: u1._id, product: p._id})
WITH
u1, p,
sum(a2.taste1)/sum(a2.strength) AS avg1,
sum(a2.taste2)/sum(a2.strength) AS avg2,
sum(a2.taste3)/sum(a2.strength) AS avg3
MERGE (u1)-[r:AVG_SCORE]->(p)
ON CREATE SET
r.taste1 = avg1,
r.taste2 = avg2,
r.taste3 = avg3
ON MATCH SET
r.taste1 = avg1,
r.taste2 = avg2,
r.taste3 = avg3
RETURN r
If there is a more efficient method then great but this is a working solution for now.
09-01-2018 10:34 AM
Hey Dave, this should be easier:
MATCH (u:User)-[s:SIMILAR]->(:User)-[t:TASTED]->(p:Product)
// turn key into a column
UNWIND keys(t) as k
// group by strength and key
WITH u,p, s.strength AS str, k, avg(t[k]) AS tx
RETURN u,p,sum(str*tx) AS weight
+--------------------------------------------+
| u | p | weight |
+--------------------------------------------+
| Node[0]{ID:"A"} | Node[1]{ID:"A"} | 147.5 |
+--------------------------------------------+
See also: http://console.neo4j.org/r/eb4xl2
09-01-2018 10:39 AM
OH!!! Thats rather elegant and has given me another idea for use case. Thank you @michael.hunger
09-01-2018 12:33 PM
I have just ran into a slight issue with that approach. I'm using the uuid module from Graphaware which throws in an error of avg() can only handle numerical values or null, is there a way to filter the keys that are only integer?
Have tried the below, but still get the numerical error
CASE t[k] WHEN toInteger(t[k]) THEN avg(t[k]) ELSE null END AS tx
09-01-2018 05:17 PM
this should work:
WHERE apoc.meta.isType(tx,'INTEGER')
All the sessions of the conference are now available online