Neo4j

ammarmuqeet · ‎03-31-2020

I am currently going through the Category Hierarchy - Overall Similarity Algorithm exercise. While I was able to complete the query for similarity algorithm, I could not understand the intuition behind removing the transitive relationships.

As I understood the similarity algorithm, the whole idea of calculating the similarity co-efficient (and having a cutoff of 0.75) was to get the similarity between the categories based on business nodes within those categories. Having done that why do we further need to check whether relationships are adjacent in the hierarchy or not?

Appreciate your help.

Regards,

Ammar

mark_needham · ‎08-06-2020

The goal is to build a hierarchy around the categories that we could then use to help users search for things. So if we have say:

Category 1 <- Category 2 <- Category 3
Category 1 <- Category 3

We can remove the direct link between Category 1 and Category 3 since there is already a relationship though Category 2. We could leave the direct link there if we wanted, removing it is mostly to make the hierarchy cleaner.

This technique was suggested by my colleague @jesus.barrasa. You can read more about it in a blog post that he wrote a few years ago - https://jbarrasa.com/2017/03/31/quickgraph5-learning-a-taxonomy-from-your-tagged-data/

Neo4j

Graph Algorithms Online Training- Similarity Algorithm Clarification