Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-17-2020 04:44 PM
Unexpected results when working through the Cosine Similarity examples in the documentation (https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/cosine/).
Using Neo4j developer edition 4.0.4 and GDS 1.3.
(1) documentation seems to miss that you need a Native Projection to make the streaming examples work. You can easily do that by passing in nodeProjection:'*', relationshipProjection:'*'
within the map or by using a pre-created named projection such as CALL gds.graph.create('blah', '*', '*') YIELD graphName, nodeCount, relationshipCount;
but that should probably be shown.
(2) Code as presented returns some symmetric results, so for example "Praveena" "Karin" 1.0 and "Karin" "Praveena" 1.0. Algorithm is symmetrical, the posted examples don't show these entries but I don't see a way of removing them other than some sort of equality comparison on id(node) which is a bit ugly.
(3) The results for Zhen - Anya and Zhen - Karin seem unexpected to me. They should both return 0 as there are no dimensions in common however, while the documented example shows them both returning 0, in my results I find Zhen - Anya gives me 0 when streaming, and Zhen Karin has no result. Passing empty vectors (indicating no dimensions in common) into gds.alpha.similarity.cosine() also generates an error instead of the expected 0.
Handy queries:
// Person name and data being passed into gds.alpha.similarity.cosine.stream
MATCH (p:Person), (c:Cuisine)
OPTIONAL MATCH (p)-[likes:LIKES]->(c)
WITH p, {item:id(p), weights: collect(coalesce(likes.score, gds.util.NaN()))} AS userData
WITH p, collect(userData) AS data
RETURN p.name, data
The query that provides unexpected results compared to manual calculations and documented results:
MATCH (p:Person), (c:Cuisine)
OPTIONAL MATCH (p)-[likes:LIKES]->(c)
WITH {item:id(p), weights: collect(coalesce(likes.score, gds.util.NaN()))} AS userData
WITH collect(userData) AS data
CALL gds.alpha.similarity.cosine.stream({nodeProjection:'*', relationshipProjection:'*', data: data})
YIELD item1, item2, count1, count2, similarity
RETURN gds.util.asNode(item1).name AS from, gds.util.asNode(item2).name AS to, similarity
ORDER BY similarity DESC
08-24-2020 09:44 AM
Initial observation: This is an alpha tier algorithm, so I don't think we should be too surprised by issues with bugs in the algorithm or the documentation. I've seen issues with other alpha status GDS algorithms. You might turn in a github issue, here https://github.com/neo4j/graph-data-science
Also, there is a new issue with Neo4j 4.1 I stumbled across while working through these examples, there is a different behavior that breaks this part of the cypher query
collect(coalesce(likes.score, gds.util.NaN())
after some investigation I narrowed it down to just this
return likes.score
which is throwing a Neo.DatabaseError.Statement.ExecutionFailed error in Neo4j 4.1
(at least when they come back from an OPTIONAL MATCH) the error detail is blank but looking at
the logs and verifying the fix below works, seems to indicate the issue is because we tried to "reference a property on a null relationship"
So there are two obvious work arounds, but there are probably other ways...
HACK FIX1: create a relationship with a null property as needed (with an inline ternary)
at the relationship level
(CASE likes
WHEN likes = null THEN {score: null}
ELSE likes
END)
HACK FIX2: at the property level
(CASE likes
WHEN likes = null THEN null
ELSE likes.score
END)
full cypher now
MATCH (p:Person), (c:Cuisine)
OPTIONAL MATCH (p)-[likes:LIKES]->(c)
WITH {item:id(p), weights: collect(coalesce((CASE likes
WHEN likes = null THEN null
ELSE likes.score
END), gds.util.NaN()))} AS userData
WITH collect(userData) AS data
CALL gds.alpha.similarity.cosine.stream({data: data})
YIELD item1, item2, count1, count2, similarity
RETURN gds.util.asNode(item1).name AS from, gds.util.asNode(item2).name AS to, similarity
ORDER BY similarity DESC
All the sessions of the conference are now available online