Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-01-2022 02:40 PM
Hi, I'm new to neo4j. I'm trying to use knn in gds do calculate similarities. I understand that knn calculates similarities between all pairs of nodes in the graph and find the most similar k nodes. However, what I'm looking for is, for given a node N, I need to find the node in the database that is most similar to N. How can I achieve this goal? Thank you for your help.
12-01-2022 07:46 PM
Have you tried to use the gds KNN algorithm? If so, what went wrong?
12-01-2022 08:31 PM
Yes I used knn but it was taking long (2 minutes) to calculate nearest neighbors for 40000 nodes in my database. What I'm hoping for is to calculate the nearest neighbor for one given node only. I tried to find if knn has such functionality but couldn't find any.
12-01-2022 08:48 PM - edited 12-01-2022 08:49 PM
There looks to be a filtered version of KNN, where you can specify the source and/or target nodes. The filter can be for specific nodes or labels. With this, you should be able to specify your single node in the filtered source nodes, so it finds the the K nearest neighbors for your single node.
it looks to be in alpha state
https://neo4j.com/docs/graph-data-science/current/algorithms/alpha/filtered-knn/
12-02-2022 09:12 AM
Ah I see thank you. I guess I'd have to wait for next release gds. In the mean time I'll find some other ways.
12-02-2022 10:38 AM
As I understand, the alpha and beta versions are accurate. They just may change before being fully promoted to non alpha or beta versions
12-02-2022 10:40 AM
I just tried the filtered knn algorithm and it worked ! but it takes around 1 minute for only 40,000 nodes compared. Is that common?
12-02-2022 10:51 AM - edited 12-02-2022 10:52 AM
Sorry, I am not a user of GDS, so I can’t comment.
12-02-2022 11:46 AM
Are there target nodes you can filter out to speed up the calculation.
12-08-2022 11:58 AM
Not really I had to search against all the nodes to find the closest in similarity. I have this implementation in SQL database so I thought moving to graph database would speed up but it looks like there's not much improvement.
All the sessions of the conference are now available online