Neo4j

Kevin6482 · ‎12-02-2022

I'm trying to construct a pipeline for link prediction to find novel links between the entity nodes. My objective is to identify the future links between protein and target given positive and negative links. I referred to the co-author link prediction tutorial, in that they considered all pair of nodes that don’t have a relationship as negative classes. But in my case, I have two csv files, one with the positive classes (i.e, proteins binding to a target) and other with the negative classes (i.e., proteins not binding to a target). I created the network as: (P:protein_id)-[:POSITIVE]-->(T:target_id) and (P:protein_id)-[:NEGATIVE]-->(T:target_id). Is that approach correct for the link prediction?

I also want to include tested_species, scale, unit and value (will use this as a weight property), all of these being string values except the target_measure_value, would that improve the prediction if I add them as properties to the target_id node or should I add them as separate nodes. Can someone guide how to proceed with this, thanks in advance.

protein_id	target_id	tested_species	target_measure_scale	target_measure_units	target_measure_val	target
A0JP26	mus musculus	homo sapiens	ic50	ug/ml	0.01	POSITIVE
A1L190	hiv inhibition	trypanosoma cruzi	mc50	um	10	POSITIVE

protein_id	target_id	tested_species	target_measure_scale	target_measure_units	target_measure_val	target
A2RUB6	venom activity	homo sapiens	ic50	um	1000	NEGATIVE
A4D1B5	signalling activity	rattus norvegicus	mc50	ug/ml	250	NEGATIVE

Neo4j

Link Prediction Pipeline for Protein Target Binding