cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Approximate nearest neighbour algorithm does not result with similar nodes

Im trying to find similar images using approximate nearest neighbour algorithm using cosine similarity. This is the query:

MATCH (p:Image)
WITH {item:id(p), weights: p.vec} AS userData
WITH collect(userData) AS data
CALL gds.alpha.ml.ann.write({
nodeProjection: '',
relationshipProjection: '
',
data: data,
topK:20,
algorithm: 'cosine',
writeRelationshipType:"SIMILAR_APPROX",
similarityCutoff: 0.1,
p:0.5,
maxIterations:50
})
YIELD nodes, similarityPairs, computations
RETURN nodes,
apoc.number.format(similarityPairs) AS similarityPairs,
apoc.number.format(computations) AS computations

But when I search similar images to one specific image, non of the results are from the same category as the first image (dolphin). I have 9119 nodes in my database. Here's the query for searching similar images to one specific image:

MATCH (r:Image) WHERE id(r)=1932
WITH r,
[(r)-[:SIMILAR_APPROX]->(i)| i.path ] AS similarNodes
RETURN similarNodes

input image: 2X_4_42c3739603456e02df4e3742b80f6e44786f35bd.jpeg
one example of output images: 2X_7_707bf4051d8f56a247ab5811553dec38160ae480.jpeg

Am I missing some parameters in algorithm or why am I getting results from other categories when clearly I have more similar images in database?

Thank you in advance!

1 REPLY 1

What are you passing to ANN to measure similarity on? The node property in p.vec?

I would check the similarity of the two images using cosine similarity directly https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/cosine/. It's possible that there's something off in your image embedding that's causing the two vectors to be quite similar. The categories you're referencing aren't available to ANN, so it's solely based on the values in p.vec.

You're also returning the top 20 most similar images, with a cutoff of 10%... which could give you some fairly dissimilar images. If you return the similarity scores of the pairs, what are they? And do you get the same value from cosine similarity run over that pair?