Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-04-2022 05:54 AM - edited 08-04-2022 05:59 AM
Let me quickly describe me graph.
I have a graph that links every species by its taxon for mammals. See below small example for Hominoidea:
There are five organisms (HSA, PPS, PTR, GGO, PON) at the end of this lineage. Only organisms that are at the end of the lineage have the property of kegg=kegg_genome_id. Each of these nodes has relationships to a different node type labelled as KO (functional orthologs). See the example below just for two organisms. The same KO nodes can link to many mammalian organisms like elephant, human or a mouse (or even to all mammals),
This results in a network with 337 (111 are organisms) taxa nodes and 12142 Ko nodes and over 1,200,000 relations.
Now i want to build a model that would predict based on KO whenever a given species belongs toEuarchontoglires. Every organism node that is linked to Euarchontoglires has a property category=1. The rest of the organisms have the property category=0.
This was just an introduction.
What I want to know is how I can calculate node2vec ONLY for these organism nodes. We do not want to have embeddings for KO nodes.
I have a projected graph:
I do not know how to write gds.beta.node2vec.write only for the nodes that I will later use for ML.
Can u guide me?
08-05-2022 06:34 AM
You can probably rather use cypher-projection where you can use arbitrary filters and pattern matches to determine the nodes and relationships to be projected into the in-memory graph.
https://neo4j.com/docs/graph-data-science/current/graph-project-cypher/
https://neo4j.com/docs/graph-data-science/current/graph-project-cypher-aggregation/
All the sessions of the conference are now available online