Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-02-2022 08:47 AM
I'm trying to construct a pipeline for link prediction to find novel links between the entity nodes. My objective is to identify the future links between protein and target given positive and negative links. I referred to the co-author link prediction tutorial, in that they considered all pair of nodes that don’t have a relationship as negative classes. But in my case, I have two csv files, one with the positive classes (i.e, proteins binding to a target) and other with the negative classes (i.e., proteins not binding to a target). I created the network as: (P:protein_id)-[:POSITIVE]-->(T:target_id) and (P:protein_id)-[:NEGATIVE]-->(T:target_id). Is that approach correct for the link prediction?
I also want to include tested_species, scale, unit and value (will use this as a weight property), all of these being string values except the target_measure_value, would that improve the prediction if I add them as properties to the target_id node or should I add them as separate nodes. Can someone guide how to proceed with this, thanks in advance.
protein_id | target_id | tested_species | target_measure_scale | target_measure_units | target_measure_val | target |
A0JP26 | mus musculus | homo sapiens | ic50 | ug/ml | 0.01 | POSITIVE |
A1L190 | hiv inhibition | trypanosoma cruzi | mc50 | um | 10 | POSITIVE |
protein_id | target_id | tested_species | target_measure_scale | target_measure_units | target_measure_val | target |
A2RUB6 | venom activity | homo sapiens | ic50 | um | 1000 | NEGATIVE |
A4D1B5 | signalling activity | rattus norvegicus | mc50 | ug/ml | 250 | NEGATIVE |
All the sessions of the conference are now available online