Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-13-2020 09:53 AM
Hi!
We are trying to implement a very simple content based recommendation query for our database.
The database for now only includes the following nodes and relationships:
Nodes & properties:
(Course {course_id})
(Game {game_id})
(Coach {coach_id})
Relationships:
(:Course)-[:BELONGSTO]->(:Game)
(:Coach)-[:PUBLISHED]->(:Course)
And given a course_id
we want to travers the graph to RETURN
20 other courses to recommend to the user.
After researching, testing and evaluating the profile(db hits, etc) with different approaches we got the following query:
Query
MATCH (c:Course {course_id: $course._id}) CALL apoc.path.subgraphNodes(c, {limit: 50,minLevel: 1,maxLevel: 2}) YIELD node WITH node WHERE labels(node)=["Course"] RETURN node LIMIT 20
The limit:50
for the apoc path expander is more a hack to prevent traversing all possibilities, but we cannot limit 20 at that point since it's taking into account all nodes involved (Game, Coach and Course).
So the "problem" for us is that, doing it this way subgraphNodes
looks like is trying first one path which in our case is always the one that relates the specific course with the coach: (:Coach)-[:PUBLISHED]->(:Course)
and therefore is giving back only courses from that specific coach.
We never get to the point where it recommends courses from the same game but a different coach.
Why subgraphNodes
is always using that first path, is it related with the order of the id's?
And we would like to get roughly the same amount per each possible path, but we couldn't find a way for that.
If anyone knows a better approach we would be much appreciated and thank you so much for your time
Neo4j version in Aura: 4
JavaScript driver: 4.1.2
10-14-2020 09:53 PM
I created a simple graph with your nodes and relationships.
MERGE (c:Course {course_id: toInteger(20)})
MERGE (c1:Course {course_id: toInteger(20)})
MERGE (c2:Course {course_id: toInteger(30)})
MERGE (c3:Course {course_id: toInteger(40)})
MERGE (c4:Course {course_id: toInteger(45)})
MERGE (c5:Course {course_id: toInteger(50)})
MERGE (g:Game {game_id: toInteger(120)})
MERGE (b:Coach {coach_id: toInteger(1000)})
MERGE (b1:Coach {coach_id: toInteger(1100)})
MERGE (c)-[:BELONGSTO]->(g)
MERGE (c1)-[:BELONGSTO]->(g)
MERGE (c2)-[:BELONGSTO]->(g)
MERGE (c3)-[:BELONGSTO]->(g)
MERGE (c4)-[:BELONGSTO]->(g)
MERGE (c5)-[:BELONGSTO]->(g)
MERGE (b)-[:PUBLISHED]->(c)
MERGE (b)-[:PUBLISHED]->(c1)
MERGE (b)-[:PUBLISHED]->(c2)
MERGE (b)-[:PUBLISHED]->(c3)
MERGE (b1)-[:PUBLISHED]->(c4)
MERGE (b1)-[:PUBLISHED]->(c5)
Result:
Since the focus is on a particular game and associated courses and coaches:
MATCH (g:Game)
CALL apoc.path.subgraphNodes(g, {})
YIELD node
RETURN node
Result: as above picture
In apoc.path.subgraphNodes you can add labelFilter and relationshipFilter:
MATCH (g:Game)
CALL apoc.path.subgraphNodes(g, {labelFilter:'Course'})
YIELD node
RETURN node
Result:
10-15-2020 01:02 AM
Hello @ameyasoft and thank you so much for your answer.
Our problem is more the following (sorry maybe I didn't explain well before):
This would maybe be a more realistic graph:
MERGE (c:Course {course_id: toInteger(20)})
MERGE (c1:Course {course_id: toInteger(20)})
MERGE (c2:Course {course_id: toInteger(30)})
MERGE (c3:Course {course_id: toInteger(40)})
MERGE (c4:Course {course_id: toInteger(45)})
MERGE (c5:Course {course_id: toInteger(50)})
MERGE (c6:Course {course_id: toInteger(55)})
MERGE (g:Game {game_id: toInteger(120)})
MERGE (g1:Game {game_id: toInteger(130)})
MERGE (b:Coach {coach_id: toInteger(1000)})
MERGE (b1:Coach {coach_id: toInteger(1100)})
MERGE (c)-[:BELONGSTO]->(g)
MERGE (c1)-[:BELONGSTO]->(g)
MERGE (c2)-[:BELONGSTO]->(g)
MERGE (c3)-[:BELONGSTO]->(g)
MERGE (c4)-[:BELONGSTO]->(g)
MERGE (c5)-[:BELONGSTO]->(g)
MERGE (c6)-[:BELONGSTO]->(g1)
MERGE (b)-[:PUBLISHED]->(c)
MERGE (b)-[:PUBLISHED]->(c1)
MERGE (b)-[:PUBLISHED]->(c2)
MERGE (b)-[:PUBLISHED]->(c3)
MERGE (b)-[:PUBLSIHED]->(c6)
MERGE (b1)-[:PUBLISHED]->(c4)
MERGE (b1)-[:PUBLISHED]->(c5)
And our start point would have to be a specific course instead of the game, using the course_id
property. This would be I think the best query for the path with max of two hops, considering there would be more games, coaches and courses.
MATCH (g:Course {course_id: 40})
CALL apoc.path.subgraphNodes(g, {maxLevel: 2})
YIELD node
RETURN node
With the following graph:
Now if we wanted to stop once we got 3 other courses how do you think we could do it? Instead of having to travers all the subgraph and then limit.
And if we try with the real database the following query:
MATCH (g:Course {course_id: "5ece57514bafe30004054b70"})
CALL apoc.path.subgraphNodes(g, {maxLevel: 2, limit:20})
YIELD node
RETURN node
We get:
Which only has courses from the coach it was published but not courses from other coaches that would come from the game. Why is that the behaviour?
Thank you for your time 🙂
10-15-2020 09:41 AM
Try this:
MATCH (g:Course {course_id: "5ece57514bafe30004054b70"})
CALL apoc.path.subgraphNodes(g, {labelFilter:'Coach|Game|Course', maxLevel: 2, limit:20})
YIELD node
RETURN node
This should show all the courses connected to Game node.
10-15-2020 09:47 AM
You can get same results with this Cypher:
match (c:Coach)-[]-(b:Course)-[]-(g:Game)
where b.course_id = 40
match (d:Coach)-[]-(e:Course)-[]-(g)
Where d.coach_id <> c.coach_id and e.course_id <> b.course_id
return c, b, g, d, e
10-15-2020 05:05 PM
When using subgraphNodes(), it's using NODE_GLOBAL uniqueness during traversal, meaning that once you find a node in the graph, that node will never be returned from any other path. Once it's visited, it will not be visited by another means.
So once courses are visited via one path (from a specific coach), then it won't ever be returned by any other path. The first one wins.
It also uses breadth-first expansion by default, so the closest nodes will be found before those at a longer distance away. You can use bfs:false
in the config map to change that to a depth-first expansion, which may find nodes at a longer distance away (though this is not guaranteed).
10-16-2020 02:28 AM
Thank you so much @ameyasoft and @andrew.bowman for your time. I think now we are very close to the outcome we were looking for.
Thanks again for your help
10-20-2020 08:36 AM
@ameyasoft @andrew.bowman
At the end we use the following query, also it's a bit different since we added the [:VIEWED]
relationship:
MATCH (c:Course {course_id: "$course_id"}) MATCH (u:Player {player_id: "$player_id"})-[:VIEWED]-(v:Course) CALL apoc.path.subgraphNodes(c, {maxLevel: 2, limit:20, relationshipFilter: "VIEWED|PUBLISHED|BELONGSTO", blacklist: [v], labelFilter:">Course"}) YIELD node RETURN node
We still have some doubts though but we are closer.
All the sessions of the conference are now available online