cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How does this CQL generate cosine similarity?

I am a beginner in Neo4j. I have been reading about how to build a recommendation system in graph databases. I have read about Cosine Similarity. The CQL given here is to caluclate the cosine similarity.

In line 3 they have multiplied 2 lists. So how do the 2 lists(i.e. r1.rating and r2.rating) are being multiplied and how are we getting the cosine similarity value from this code? A step by step explanation will be apprecited.
The following code snippet can be found here Query 7.

MATCH (c1:Customer)-[r1:RATED]->(p:Product)<-[r2:RATED]-(c2:Customer)
WITH
SUM(r1.rating*r2.rating) as dot_product,
SQRT( REDUCE(x=0.0, a IN COLLECT(r1.rating) | x + a^2) ) as r1_length,
SQRT( REDUCE(y=0.0, b IN COLLECT(r2.rating) | y + b^2) ) as r2_length,
c1,c2
MERGE (c1)-[s:SIMILARITY]-(c2)
SET s.similarity = dot_product / (r1_length * r2_length)

1 REPLY 1

poonsfci
Node Link

Hi, Cosine Similarity of two vectors, namely A and B, can be (dot product of A and B)/(magnitude of A * magnitude of B)
It might be easier to refer Cosine similarity in Wiki: https://en.wikipedia.org/wiki/Cosine_similarity

  • SUM(r1.rating*r2.rating) is for the dot product of A and B
  • SQRT( REDUCE(x=0.0, a IN COLLECT(r1.rating) | x + a^2) ) as r1_length is the magnitude of A
  • SQRT( REDUCE(y=0.0, b IN COLLECT(r2.rating) | y + b^2) ) as r2_length is the magnitude of B

Hope it helps.