Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-03-2021 02:03 AM
How to get topN for each node in link prediction algorithm in the GDS
Hi ! I am using the link prediction algorithm from the gds library to predict links in a network. I generate the fastRP embeddings using 10 properties of nodes. Its takes 5-6 minutes to train the model on 27k nodes and 4million relations(having only 2 types of relations and 1 type of nodes). I have 2 questions.
WITH "CALL gds.alpha.ml.linkPrediction.predict.stream('Mygraph', {relationshipTypes: ['connected'],modelName: 'linkpredict_with_embedding',topN: 1800, threshold: 0.00001}) YIELD node1, node2, probability MATCH (n), (m) WHERE id(n) = node1 AND id(m) = node2 RETURN n.nodeid AS node1, m.nodeid AS node2, probability;" AS query
CALL apoc.export.csv.query(query, "predcited_links.csv", {}) YIELD file RETURN file;
12-03-2021 09:32 AM
If you try out the 1.8 edition of the library (published this week!) you'll see we've made some improvements to Link Prediction that should result in much faster predictions.
Link prediction now supports both topN
and topK
, to limit your results as well: Link prediction pipelines - Neo4j Graph Data Science
You can also use gds.beta.graph.export.csv
(after mutating your projection) instead of streaming your results and using apoc.export.csv.query()
12-06-2021 07:44 AM
Thanks a lot Alicia for the immediate response. Looking forward to use this updated version.
All the sessions of the conference are now available online