Neo4j

Kevin6482 · ‎08-04-2022

I'm constructing a biomedical knowledge graph, I collected the data from different open sources, All values in each node are unique, and there are no duplicate rows in relationships. (verified thoroughly)

These are my nodes: assays, cells, clinicals, compounds, disorders, drugs, foods, genes, metabolites, organisms, pathways, peptides, proteins, targets, therapeutics.

These are my relationships: cell_FROM_species, clinical_IS_ASSOCIATED_disorder, clinical_IS_ASSOCIATED_drug, compound_IS_ASSOCIATED_protein, drug_CAUSES_disorder, drug_INTERACTS_target, food_IS_ASSOCIATED_compound, metabolite_IS_ASSOCIATED_pathway, peptide_TESTED_IN_assay, peptide_BINDS_TO_protein, peptide_IS_ASSOCIATED_therapeutics, protein_IS_ASSOCIATED_disorder, protein_IS_ASSOCIATED_gene, protein_COMES_FROM_organism, protein_IS_EXPRESSED_IN_pathway.

I used neo4j admin to import data using below command, (since it's a long one, I only mentioned a sample)

C:/Users/mypc/.Neo4jDesktop/relate-data/dbmss/bin/neo4j-admin import --database=db1 --nodes=import/assays.csv --nodes=import/cells.csv --nodes=import/clinicals.csv --………………………………………. --relationships=import/ cell_FROM_species.csv --relationships=import/ clinical_IS_ASSOCIATED_disorder.csv …………………………………………………………………………--multiline-fields=true

I ended up with this schema, I could see there are some new relationships been created between nodes, example:

peptide IS_ASSOCIATED with compound which I didn't mention.
protein IS_ASSOCIATED with compound, but I gave the opposite which is compound IS_ASSOCIATED with protein
Also why compound IS_ASSOCIATED with compound (same node)

Can someone correct me where I'm going wrong? Thanks in advance.

#neo4j-admin #relationships

glilienfield · ‎08-04-2022

Did you use 'db.schema.visualization' to get the schema? I recall helping someone out months ago where the schema was not accurately representing their data. I believe someone else in the community mentioned there is a known issue with this method. The relationships did not actually exists in his data. I suggest you query your data to verify those relations do indeed exists or do not. Something like this for each relationship you don't expect:

return exists( (:Peptide)-[:IS_ASSOCIATED]->(:Compound) )

This should provide you an inventory of your relationships. I assumed that data model only has one label per node.

match(n)-[r]->(m)
return labels(n)[0] as `start node`, type(r) as `relationship type`, labels(m)[0] as `end node`, count(*) as count

View solution in original post

glilienfield · ‎08-04-2022

Did you use 'db.schema.visualization' to get the schema? I recall helping someone out months ago where the schema was not accurately representing their data. I believe someone else in the community mentioned there is a known issue with this method. The relationships did not actually exists in his data. I suggest you query your data to verify those relations do indeed exists or do not. Something like this for each relationship you don't expect:

return exists( (:Peptide)-[:IS_ASSOCIATED]->(:Compound) )

This should provide you an inventory of your relationships. I assumed that data model only has one label per node.

match(n)-[r]->(m)
return labels(n)[0] as `start node`, type(r) as `relationship type`, labels(m)[0] as `end node`, count(*) as count

Kevin6482 · ‎08-07-2022

Thanks for your response, well yes I used 'db.schema.visualization' to get this schema, I used your Cypher query and found that those relationships were not actually existing, but I don't know why the schema was showing those relationships. I also found that this issue was fixed with apoc version.

Neo4j

Does relationships get automatically generated between nodes?