Neo4j

sakshi · ‎12-03-2021

How to get topN for each node in link prediction algorithm in the GDS

Hi ! I am using the link prediction algorithm from the gds library to predict links in a network. I generate the fastRP embeddings using 10 properties of nodes. Its takes 5-6 minutes to train the model on 27k nodes and 4million relations(having only 2 types of relations and 1 type of nodes). I have 2 questions.

When I predict the links after training, can I get topN for every node.(Let's say I want to get top 10 possible links for every node even if its 0.00001 probability)
Is there any better way of writing query for prediction part so that it takes lesser time. I am using the following code to predict the links after training.

WITH "CALL gds.alpha.ml.linkPrediction.predict.stream('Mygraph', {relationshipTypes: ['connected'],modelName: 'linkpredict_with_embedding',topN: 1800,  threshold: 0.00001}) YIELD node1, node2, probability MATCH (n), (m) WHERE id(n) = node1 AND id(m) = node2  RETURN n.nodeid AS node1, m.nodeid AS node2, probability;" AS query
  CALL apoc.export.csv.query(query, "predcited_links.csv", {}) YIELD file RETURN file;

alicia_frame1 · ‎12-03-2021

If you try out the 1.8 edition of the library (published this week!) you'll see we've made some improvements to Link Prediction that should result in much faster predictions.

Link prediction now supports both topN and topK, to limit your results as well: Link prediction pipelines - Neo4j Graph Data Science

You can also use gds.beta.graph.export.csv (after mutating your projection) instead of streaming your results and using apoc.export.csv.query()

sakshi · ‎12-06-2021

Thanks a lot Alicia for the immediate response. Looking forward to use this updated version.