Neo4j

axel · ‎10-13-2020

Hi!

We are trying to implement a very simple content based recommendation query for our database.

The database for now only includes the following nodes and relationships:

Nodes & properties:

(Course {course_id})
(Game {game_id})
(Coach {coach_id})

Relationships:

(:Course)-[:BELONGSTO]->(:Game)
(:Coach)-[:PUBLISHED]->(:Course)

And given a course_id we want to travers the graph to RETURN 20 other courses to recommend to the user.

After researching, testing and evaluating the profile(db hits, etc) with different approaches we got the following query:

Query
MATCH (c:Course {course_id: $course._id}) CALL apoc.path.subgraphNodes(c, {limit: 50,minLevel: 1,maxLevel: 2}) YIELD node WITH node WHERE labels(node)=["Course"] RETURN node LIMIT 20

The limit:50 for the apoc path expander is more a hack to prevent traversing all possibilities, but we cannot limit 20 at that point since it's taking into account all nodes involved (Game, Coach and Course).

So the "problem" for us is that, doing it this way subgraphNodes looks like is trying first one path which in our case is always the one that relates the specific course with the coach: (:Coach)-[:PUBLISHED]->(:Course) and therefore is giving back only courses from that specific coach.
We never get to the point where it recommends courses from the same game but a different coach.

Why subgraphNodes is always using that first path, is it related with the order of the id's?

And we would like to get roughly the same amount per each possible path, but we couldn't find a way for that.

If anyone knows a better approach we would be much appreciated and thank you so much for your time

Neo4j version in Aura: 4
JavaScript driver: 4.1.2

ameyasoft · ‎10-14-2020

I created a simple graph with your nodes and relationships.

MERGE (c:Course {course_id: toInteger(20)})
MERGE (c1:Course {course_id: toInteger(20)})
MERGE (c2:Course {course_id: toInteger(30)})
MERGE (c3:Course {course_id: toInteger(40)})
MERGE (c4:Course {course_id: toInteger(45)})
MERGE (c5:Course {course_id: toInteger(50)})

MERGE (g:Game {game_id: toInteger(120)})

MERGE (b:Coach {coach_id: toInteger(1000)})
MERGE (b1:Coach {coach_id: toInteger(1100)})

MERGE (c)-[:BELONGSTO]->(g)
MERGE (c1)-[:BELONGSTO]->(g)
MERGE (c2)-[:BELONGSTO]->(g)
MERGE (c3)-[:BELONGSTO]->(g)
MERGE (c4)-[:BELONGSTO]->(g)
MERGE (c5)-[:BELONGSTO]->(g)

MERGE (b)-[:PUBLISHED]->(c)
MERGE (b)-[:PUBLISHED]->(c1)
MERGE (b)-[:PUBLISHED]->(c2)
MERGE (b)-[:PUBLISHED]->(c3)

MERGE (b1)-[:PUBLISHED]->(c4)
MERGE (b1)-[:PUBLISHED]->(c5)

Result:

Since the focus is on a particular game and associated courses and coaches:

MATCH (g:Game) 
CALL apoc.path.subgraphNodes(g, {}) 
YIELD node
RETURN node

Result: as above picture
In apoc.path.subgraphNodes you can add labelFilter and relationshipFilter:

MATCH (g:Game) 
CALL apoc.path.subgraphNodes(g, {labelFilter:'Course'}) 
YIELD node
RETURN node

Result:

axel · ‎10-15-2020

Hello @ameyasoft and thank you so much for your answer.

Our problem is more the following (sorry maybe I didn't explain well before):
This would maybe be a more realistic graph:

MERGE (c:Course {course_id: toInteger(20)})
MERGE (c1:Course {course_id: toInteger(20)})
MERGE (c2:Course {course_id: toInteger(30)})
MERGE (c3:Course {course_id: toInteger(40)})
MERGE (c4:Course {course_id: toInteger(45)})
MERGE (c5:Course {course_id: toInteger(50)})
MERGE (c6:Course {course_id: toInteger(55)})

MERGE (g:Game {game_id: toInteger(120)})
MERGE (g1:Game {game_id: toInteger(130)})

MERGE (b:Coach {coach_id: toInteger(1000)})
MERGE (b1:Coach {coach_id: toInteger(1100)})

MERGE (c)-[:BELONGSTO]->(g)
MERGE (c1)-[:BELONGSTO]->(g)
MERGE (c2)-[:BELONGSTO]->(g)
MERGE (c3)-[:BELONGSTO]->(g)
MERGE (c4)-[:BELONGSTO]->(g)
MERGE (c5)-[:BELONGSTO]->(g)
MERGE (c6)-[:BELONGSTO]->(g1)

MERGE (b)-[:PUBLISHED]->(c)
MERGE (b)-[:PUBLISHED]->(c1)
MERGE (b)-[:PUBLISHED]->(c2)
MERGE (b)-[:PUBLISHED]->(c3)
MERGE (b)-[:PUBLSIHED]->(c6)

MERGE (b1)-[:PUBLISHED]->(c4)
MERGE (b1)-[:PUBLISHED]->(c5)

And our start point would have to be a specific course instead of the game, using the course_id property. This would be I think the best query for the path with max of two hops, considering there would be more games, coaches and courses.

MATCH (g:Course {course_id: 40}) 
CALL apoc.path.subgraphNodes(g, {maxLevel: 2}) 
YIELD node
RETURN node

With the following graph:

Now if we wanted to stop once we got 3 other courses how do you think we could do it? Instead of having to travers all the subgraph and then limit.

And if we try with the real database the following query:

MATCH (g:Course {course_id: "5ece57514bafe30004054b70"}) 
CALL apoc.path.subgraphNodes(g, {maxLevel: 2, limit:20}) 
YIELD node
RETURN node

We get:

Which only has courses from the coach it was published but not courses from other coaches that would come from the game. Why is that the behaviour?

Thank you for your time 🙂

ameyasoft · ‎10-15-2020

Try this:
MATCH (g:Course {course_id: "5ece57514bafe30004054b70"}) 
CALL apoc.path.subgraphNodes(g, {labelFilter:'Coach|Game|Course', maxLevel: 2, limit:20}) 
YIELD node
RETURN node

This should show all the courses connected to Game node.

ameyasoft · ‎10-15-2020

You can get same results with this Cypher:

match (c:Coach)-[]-(b:Course)-[]-(g:Game)
where b.course_id = 40
match (d:Coach)-[]-(e:Course)-[]-(g)
Where d.coach_id <> c.coach_id and e.course_id <> b.course_id
return c, b, g, d, e

andrew_bowman · ‎10-15-2020

When using subgraphNodes(), it's using NODE_GLOBAL uniqueness during traversal, meaning that once you find a node in the graph, that node will never be returned from any other path. Once it's visited, it will not be visited by another means.

So once courses are visited via one path (from a specific coach), then it won't ever be returned by any other path. The first one wins.

It also uses breadth-first expansion by default, so the closest nodes will be found before those at a longer distance away. You can use bfs:false in the config map to change that to a depth-first expansion, which may find nodes at a longer distance away (though this is not guaranteed).

axel · ‎10-16-2020

Thank you so much @ameyasoft and @andrew.bowman for your time. I think now we are very close to the outcome we were looking for.

Thanks again for your help

axel · ‎10-20-2020

@ameyasoft @andrew.bowman
At the end we use the following query, also it's a bit different since we added the [:VIEWED] relationship:

MATCH (c:Course {course_id: "$course_id"}) MATCH (u:Player {player_id: "$player_id"})-[:VIEWED]-(v:Course) CALL apoc.path.subgraphNodes(c, {maxLevel: 2, limit:20, relationshipFilter: "VIEWED|PUBLISHED|BELONGSTO", blacklist: [v], labelFilter:">Course"}) YIELD node RETURN node

We still have some doubts though but we are closer.

Neo4j

How apoc.path.subgraphNodes travers the graph?