Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-18-2020 07:18 PM
Hey there!
I am building a simple social network like recommendation engine that should score the users connections to each other.
I have a question about adding scores to relationships:
(User)-[:KNOWS_VIA_SCHOOL]->(User)
(User)-[:KNOWS_VIA_PHONE]->(User)
(User)-[:WORKED_TOGETHER]->(User)
(User)-[:TRAVELLED_TOGETHER]->(User)
How can I create a simple recommendation engine for the data above based on the following constraints:
- For instance, if a user A has a
KNOWS_VIA_SCHOOL
rel with another user B it has a score of 3, and if user A has aKNOWS_VIA_PHONE
rel with user B it has a score of 2.- If user A has two rels with user B, we should sum the score of the rels. For example now user A and user B have a connection score of 5 (3 because of KNOWS_VIA_SCHOOL and 2 because of KNOWS_VIA_PHONE)
The plan I have so far is to query the different rels and collect them in collections as a tuple of [user, score] as seen below:
Match(user: User { id: $user_id })
OPTIONAL MATCH (user)-[:KNOWS_VIA_SCHOOL]->(schoolUser)
// Collect the [user, score] array
WITH user, collect([schoolUser, 3]) as usersSoFar
OPTIONAL MATCH (user)-[:KNOWS_VIA_PHONE]->(phoneUser)
// Collect the [user, score] array -- note that the below code doesn't account for duplicates
WITH user, usersSoFar + collect([phoneUser, 2]) as usersSoFar
// return usersSoFar ordered by score
...etc
Notice that I used simple direct relationships for the example below but the real data has multiple hops and is not as straighforward.
The issue here is that the usersSoFar
collection now has duplicates that I can't merge together and sum the scores of.
How it is now: [[{id: 1}, score: 3], [{id: 1}, score: 2], [{id: 2}, score: 2]]
How it should be: [[{id: 1}, score: 5], [{id: 2}, score: 2]]
// similar behavior to lodash.mergeBy
Notes:
This is a simplified version of the data, not all user relationships are direct. for example sometimes the rels are as follows (User)--()--(User)
09-20-2020 05:34 AM
One option would be to add a score value to the relationship.
Say you are creating KNOWS_VIA_SCHOOL between u1 and u2 you can set the property like this.
MERGE (u1)-[r:KNOWS_VIA_SCHOOL]->(u20
SET r.score=3
You can do the same for all the relationships.
Now to find the total score this way
MATCH(user: User { id: $user_id })
MATCH (user)-[r]->(other:User)
return u, other, sum(r.score) as score
This will give a score between user and other.
Other option is to have a separate relationship, says HAS_REL which contains the aggregated score between those users. You need to keep the scores updated when ever the relationships are created or removed.
Advantage of having this relationship is you can use only this relationship to traverse among users and do complex operations.
09-23-2020 06:05 PM
@anthapu Thanks for the ideas!
Adding the score value to the relationship works fine in case of the query above due to the simple behavior of fetching via one pattern. The issue in my query is that not everything can be done on a single relationship level, so I might need to add another MATCH there and collect the results in a collection and UNWIND later making sure that I only leave distinct users.
eg:
MATCH(user: User { id: $user_id })
MATCH (user)-[r]->(other:User)
with user, collect([other, r]) as dataSoFar
MATCH (user)-[x]-()-[y]-(other)
with user dataSoFar + collect([other, y]) // here we choose the y rel not x since it's what matters
// here we are stuck with the collection above which might have the same user twice, with different rels eg: [[Tom, { score: 2 }], [Elona, { score: 3 }], [Tom, { score: 5 }]] -- Tom is duplicated here, he should be there once with a score of 7
The issue now is that the collection will have either:
– the user without the relationship, in which case I don't have the score anymore
– a tuple of the user and the relationship eg: [user, rel]. But in this case I have no way of summing the scores later
I think the idea of enriching the graph beforehand makes more sense. However how would 2nd and 3rd degree connections factor in with that? Would I have to create 2nd degree knows relationships every time 2 people connect?
I am not sure about how this would behave performance wise.
09-24-2020 04:14 AM
@A-Tokyo I am failing to understand your requirement. Why are you trying to manually look at each step?
Can you not use path using variable expand?
MATCH(user: User { id: $user_id })
MATCH p=(user)-[r*1..7]->(other:User)
WITH p, other, reduce(s = 0, x IN relationships(p) | s + x.score) as totalScore
return u, other, totalScore
This gives the whole path cost to reach other.
Also, it wold always be better to use directions in your query.
09-24-2020 06:46 AM
I am not trying to manually look at each step, I simply want a combination of different paths between users.
I think the path variable might work, have not tried it before. Will give it a try.
I can make do with the total cost of relationships in the path. The main point was that not all knows relationships are direct. Some of them have to be indirect.
The reason I try to look at each step is to combine the scores of all the ways one person might know the other.
The relationships are a lot, and I would like to either combine the scores or at least show the user the relationship label. eg: when knows_via_phone users are fetched we can show some label like You know Tony via phonebook
. The graph is not enriched with direct rels before hand like mentioned before.
Thanks for the note! Direction was omitted on purpose for the sake of easiness
09-25-2020 04:03 AM
MATCH(user: User { id: $user_id })
MATCH p=(user)-[r*1..7]->(other:User)
WITH p, other, reduce(s = 0, x IN relationships(p) | s + x.score) as score
return u, other, sum(totalScore)
If you use this query you will get sum of all the paths no matter how you reach the other person.
This simple query
MATCH(user: User { id: $user_id })
MATCH (user)-[r]->(other:User)
return u, other, sum(r.score) as score
returns the sum of all the relationships the user has.
If you do want to have the types also you can change the query to
MATCH(user: User { id: $user_id })
MATCH (user)-[r]->(other:User)
return u, other, sum(r.score) as score, collect(type(r)) as relations
This will give the total score and how the users are related. You can use this information in the UI to show the message like you mentioned
For longer paths you might have to collect the paths in UI or calculate the sum in cypher and list of paths and decipher them in UI
09-25-2020 06:09 AM
Hi!
have you considered to use "personalized page rank" ?
We have developed a fast implementation for Neo4j: look here
All the sessions of the conference are now available online