Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-07-2022 07:54 AM
Hi Neo4j community,
Following on from some previous advice (thank you @Rcolinp ) I was able to import the following ontology file (Thesaurus_22.06d.OWL (https://evs.nci.nih.gov/ftp1/NCI_Thesaurus/Thesaurus_22.06d.OWL.zip)) into a neo4j knowledge graph.
Although it appears all triples have been parsed and imported, the relationships displayed are not what I expected. I believe all the data is there, which leads me to believe the cause is how I have configured the graph prior to importing the data
FWIW I'm looking to emulate something along the lines of the following
This is what I see after running the following query
MATCH (alpelisib)
WHERE single(x IN alpelisib.ns2__NHC0 WHERE x = "C94214")
RETURN alpelisib
As you can Nodes appear correctly but, relationships are nor.
Any pointers as to where I may be going wrong would be greatly appreciated.
Thanks in advance.
Chris
Solved! Go to Solution.
09-08-2022 04:53 PM - edited 09-08-2022 05:18 PM
Hi @ceiag,
Thanks for providing the graph view of your example from NCIt. I'll use that as my basis/reference as what you are looking for upon importing the Thesaurus.owl OWL ontology.
Your hunch regarding your an incorrect graph configuration you have set prior to import causing the issues you have illustrated above is correct but it isn't exactly the only reason the graph view from NCIt differs in naming convention (from what you are seeing in Neo4j). What you are seeing in Neo4j with the current configuration is actually the true raw OWL ontology (disregarding that the uris have been shortened by default and prefixed with an nsx__. This is a result of Neo4j adhering to the default value for the handleVocabUris
parameter (see more here --> Configuring Neo4j to use RDF data)).
The NCIt Graph View on the other hand is a modified/transformed graph visualization that is surfacing rdfs:label
for the associations contained in this ontology rather than the URI or shortened URI (in Neo4j we are seeing the shortened). Reference NCI Thesaurus documentation regarding how the metadata within this ontology translates to "human readable language". (Thesaurus.owl metadata documentation)
With that said there is a way to perform this transformation within Neo4j! No worries! But first, let's first take a quick look at your current graphConfig:
When using the current graphConfig, handleMultival
has been set to "Array"
. When setting handleMultival
to "Array"
, this is instructing Neo4j to import and store all property values as arrays (including properties that wouldn't make sense to be stored as arrays --> for example: single value properties). In addition to all property values being stored as arrays when handleMultival
is set to "ARRAY"
in our GraphConfig
if we don’t provide a list of property URIs as multivalPropList
(within the graphConfig) all properties will be stored as arrays. So if handleMultival
needs to be set to "ARRAY"
, you need to also specify multivalPropList
within the graphConfig as-well. This isn't contributing to the reason you are seeing ns2__A31
rather than Has_GDC_Value
but this is storing all node property values as arrays when they all should not.
Easy Initial Solution to Node Properties as Array Problem:
Change your graphConfig to either omit handleMultival
entirely (if the ontology doesn't contain any multi-value properties/you don't care about those properties that are multival) OR specify the exact multi-valued property(s) that should be stored as an array by specifying multivalPropList
within your graphConfig. Take a look below:
graphConfig:
CALL n10s.graphconfig.init( { handleRDFTypes: "LABELS_AND_NODES" } );
Cypher Statement to Review:
MATCH (alpelisib)-[r:ns2__A32]->(pharmSub)
WHERE alpelisib.ns2__NHC0 = "C94214"
RETURN alpelisib, r, pharmSub;
Result Vis:
Note that this is 100% correct based on the OWL file:
<!-- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31 -->
<owl:AnnotationProperty rdf:about="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31">
<NHC0>A31</NHC0>
<P106>Conceptual Entity</P106>
<P108>Has_GDC_Value</P108>
<P90>Has_GDC_Value</P90>
<P97>An association that connects a concept representing a GDC property to its dedicated permissible value concept(s).</P97>
<rdfs:label>Has_GDC_Value</rdfs:label>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#anyURI"/>
</owl:AnnotationProperty>
we can see that http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31
has been shortened by Neo4j to ns2__A31 (as expected as handleVocabUris
within the graphConfig has defaulted to its default value "SHORTEN"
).
As we can see in the OWL snippet, the AnnotationProperty has rdfs:label
value of Has_GDC_Value
, but upon import using this graphConfig, Neo4j is simply shortening the URI of the predicate to its raw value. If you'd like to further edit what these relationshipTypes (it sounds like you do or want to mirror NCIt graph view), refer to Mapping Graph Models - Neosemantics (4.3). This will walk you through how to set the proper graph configuration to allow you to utilize other neosemantics (n10s) procedures to add namespace prefix definitions and create actual mappings for individual elements in the graph to elements to match the NCIt graph view.
To help you get going I have provided the steps required to take below too 😃.
(Please note: You'll have to add each relationshipType as a distinct mapping using n10s.mapping.add()).
Solution To Get You Started:
// Create Uniqueness Constraint
CREATE CONSTRAINT n10s_unique_uri ON (r:Resource) ASSERT r.uri IS UNIQUE;
// Create GraphConfig --> need to SET handleVocabUris to Map. This will enable ability to ensure Neo4j mirrors NCIt Graph View
CALL n10s.graphconfig.init( {
handleVocabUris: "MAP"
});
// Create Prefix Definitions (using addFromText procedure from n10s)
CALL n10s.nsprefixes.addFromText('
<rdf:RDF xmlns="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#"
xml:base="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#"
xmlns:Thesaurus="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
');
// Create Mapping from http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31 to Has_GDC_Value
CALL n10s.mapping.add("http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31", "Has_GDC_Value");
// Create Mapping from http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A32 to Is_Value_For_GDC_Property
CALL n10s.mapping.add("http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A32", "Is_Value_For_GDC_Property");
// Add all mappings...
// Lastly... Import Thesaurus.owl
CALL n10s.rdf.import.fetch('file:///var/lib/neo4j/import/Thesaurus.owl', 'RDF/XML');
Now we can query the graph & see the transformation:
MATCH (x)-[r:Is_Value_For_GDC_Property]->(y)
WHERE x.NHC0 = 'C94214'
RETURN x, r, y;
Desired Result!:
I hope this is of help to you! Feel free to ping back if you need more help!
Best,
Rob
02-03-2023 01:23 AM
Hey @ceiag
Apologies for delay! If no longer relevant to yourself, possibly I can at least leave an answer to help another person that stumbles upon this thread!
You are correct that owl:Restriction is a special class which is the reason you are having issues representing Roles as a relationshipType in Neo4j is simply how the NCI Thesaurus OWL Ontology represents its data. (note: You'll have this issue with all metadata except Associations)
In NCI Thesaurus OWL ontology, only Associations are persisted as relationshipTypes in Neo4j which furthermore can be mapped using n10s (as shown in the above solution). Reason being, in the OWL file Associations are represented as ObjectProperties while Roles and the rest of the metadata (to my knowledge) are not represented as ObjectProperties.
Associations persist from predicate to relationshipType upon import as they are represented as ObjectProperties, which are relationships between two Resources in the OWL ontology. The other predicates such as Roles are not represented as ObjectProperties, but as Restrictions (as you noticed). This is a special kind of class that defines conditions and/or constraints on the individuals or classes in the ontology. As consequence these will be mapped to properties of nodes in Neo4j by default.
In your example, the gene_product_is_physical_part_of
Role is represented as the R51
restriction. This restriction is used to define a constraint on the C30168
class (Phosphatidylinositol 4,5-Bisphosphate 3-Kinase Catalytic Subunit Alpha Isoform
) by specifying that it must have a relationship to the C17270
class through the R51
restriction, which represents the gene_product_is_physical_part_of
Role.
This is all just a consequence of how the data is represented in the OWL ontology. You could manipulate the structure of the OWL to transform Roles into ObjectProperties or modifying existing Restrictions to represent the Roles as ObjectProperties. That being said, there will be implications with such a modification on the overall structure and meaning of the data in the OWL ontology. I.e. impacting the validity of the ontology and/or change the meaning of the data altogether. It really all comes down and depends on your specific use-case. Depending on your use-case modifying the ontology in such a way might be the right fix. But I wouldn't out-right say that is the blanket solution/correct solution.
Hopefully this is of help and informative at the very least!
Best,
Rob
09-07-2022 08:09 AM
What do you mean the relationships do not? One thing to keep in mind is that neo4j browser by default shows all connecting relationships between the nodes returned in a query. this may be misleading when you are looking for specific relationships between nodes. You can turn this setting off in the settings panel within the browser view. It is the 'connect result nodes' checkbox.
09-07-2022 08:19 AM
Hi,
Thanks for reply.
Take for example the Relationships between the node Alpelisib & Pharmacologic Substance. I would like to see the following.
Note the 'Has_GDC_Value' and 'is_Value_For_GDC_Property', what I see on the neo4j implementation is the following
I hope that make sense.
Regards
Chris
09-07-2022 08:30 AM
The values shown relationships and nodes is set in the browser. It looks like you want to display the relationship type, while you are showing a property of the relationship. You can change the visual effects of a node label and relationship by clicking on the either of them in the browser and setting the color, property to show, and width for relationships.
Click on a relationship in the browser. That will show the relationship properties to the right. click on the relationship button, which will bring up a box as shown below. Select 'type' to show the relationship type in the graph.
09-08-2022 02:28 AM
Hi,
That is something I did previously try, but unfortunately to no avail.
See below for the corresponding snippet from the ontology file
<!-- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31 -->
<owl:AnnotationProperty rdf:about="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31">
<NHC0>A31</NHC0>
<P106>Conceptual Entity</P106>
<P108>Has_GDC_Value</P108>
<P90>Has_GDC_Value</P90>
<P97>An association that connects a concept representing a GDC property to its dedicated permissible value concept(s).</P97>
<rdfs:label>Has_GDC_Value</rdfs:label>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#anyURI"/>
</owl:AnnotationProperty>
09-08-2022 02:42 AM
My mistake, you indicated in a previous reply that the values you want are properties. From the screenshot I can see that the relationship you selected does not have any properties, since the only options for display are the relationship’s type and it’s internal ‘id’. You may want to revisit how you imported the data, so you can include the properties you need.
Do you want to post the script and the data file to see if we can find the issue?
09-08-2022 06:36 AM
Hi,
Apologies for any confusion caused, I'm relatively new to the world of knowledge graphs and neo4j. Really appreciate your continued assistance.
Data file is available at the following location - https://evs.nci.nih.gov/ftp1/NCI_Thesaurus/Thesaurus_22.06d.OWL.zip
Scripts I used to upload the data is as follows
CREATE CONSTRAINT n10s_unique_uri ON (r:Resource)
ASSERT r.uri IS UNIQUE;
call n10s.graphconfig.init( { handleMultival: "ARRAY", handleRDFTypes: "LABELS_AND_NODES" })
CALL n10s.rdf.import.fetch("file:///var/lib/neo4j/import/Thesaurus.owl","RDF/XML")
09-08-2022 04:53 PM - edited 09-08-2022 05:18 PM
Hi @ceiag,
Thanks for providing the graph view of your example from NCIt. I'll use that as my basis/reference as what you are looking for upon importing the Thesaurus.owl OWL ontology.
Your hunch regarding your an incorrect graph configuration you have set prior to import causing the issues you have illustrated above is correct but it isn't exactly the only reason the graph view from NCIt differs in naming convention (from what you are seeing in Neo4j). What you are seeing in Neo4j with the current configuration is actually the true raw OWL ontology (disregarding that the uris have been shortened by default and prefixed with an nsx__. This is a result of Neo4j adhering to the default value for the handleVocabUris
parameter (see more here --> Configuring Neo4j to use RDF data)).
The NCIt Graph View on the other hand is a modified/transformed graph visualization that is surfacing rdfs:label
for the associations contained in this ontology rather than the URI or shortened URI (in Neo4j we are seeing the shortened). Reference NCI Thesaurus documentation regarding how the metadata within this ontology translates to "human readable language". (Thesaurus.owl metadata documentation)
With that said there is a way to perform this transformation within Neo4j! No worries! But first, let's first take a quick look at your current graphConfig:
When using the current graphConfig, handleMultival
has been set to "Array"
. When setting handleMultival
to "Array"
, this is instructing Neo4j to import and store all property values as arrays (including properties that wouldn't make sense to be stored as arrays --> for example: single value properties). In addition to all property values being stored as arrays when handleMultival
is set to "ARRAY"
in our GraphConfig
if we don’t provide a list of property URIs as multivalPropList
(within the graphConfig) all properties will be stored as arrays. So if handleMultival
needs to be set to "ARRAY"
, you need to also specify multivalPropList
within the graphConfig as-well. This isn't contributing to the reason you are seeing ns2__A31
rather than Has_GDC_Value
but this is storing all node property values as arrays when they all should not.
Easy Initial Solution to Node Properties as Array Problem:
Change your graphConfig to either omit handleMultival
entirely (if the ontology doesn't contain any multi-value properties/you don't care about those properties that are multival) OR specify the exact multi-valued property(s) that should be stored as an array by specifying multivalPropList
within your graphConfig. Take a look below:
graphConfig:
CALL n10s.graphconfig.init( { handleRDFTypes: "LABELS_AND_NODES" } );
Cypher Statement to Review:
MATCH (alpelisib)-[r:ns2__A32]->(pharmSub)
WHERE alpelisib.ns2__NHC0 = "C94214"
RETURN alpelisib, r, pharmSub;
Result Vis:
Note that this is 100% correct based on the OWL file:
<!-- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31 -->
<owl:AnnotationProperty rdf:about="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31">
<NHC0>A31</NHC0>
<P106>Conceptual Entity</P106>
<P108>Has_GDC_Value</P108>
<P90>Has_GDC_Value</P90>
<P97>An association that connects a concept representing a GDC property to its dedicated permissible value concept(s).</P97>
<rdfs:label>Has_GDC_Value</rdfs:label>
<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#anyURI"/>
</owl:AnnotationProperty>
we can see that http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31
has been shortened by Neo4j to ns2__A31 (as expected as handleVocabUris
within the graphConfig has defaulted to its default value "SHORTEN"
).
As we can see in the OWL snippet, the AnnotationProperty has rdfs:label
value of Has_GDC_Value
, but upon import using this graphConfig, Neo4j is simply shortening the URI of the predicate to its raw value. If you'd like to further edit what these relationshipTypes (it sounds like you do or want to mirror NCIt graph view), refer to Mapping Graph Models - Neosemantics (4.3). This will walk you through how to set the proper graph configuration to allow you to utilize other neosemantics (n10s) procedures to add namespace prefix definitions and create actual mappings for individual elements in the graph to elements to match the NCIt graph view.
To help you get going I have provided the steps required to take below too 😃.
(Please note: You'll have to add each relationshipType as a distinct mapping using n10s.mapping.add()).
Solution To Get You Started:
// Create Uniqueness Constraint
CREATE CONSTRAINT n10s_unique_uri ON (r:Resource) ASSERT r.uri IS UNIQUE;
// Create GraphConfig --> need to SET handleVocabUris to Map. This will enable ability to ensure Neo4j mirrors NCIt Graph View
CALL n10s.graphconfig.init( {
handleVocabUris: "MAP"
});
// Create Prefix Definitions (using addFromText procedure from n10s)
CALL n10s.nsprefixes.addFromText('
<rdf:RDF xmlns="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#"
xml:base="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#"
xmlns:Thesaurus="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
');
// Create Mapping from http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31 to Has_GDC_Value
CALL n10s.mapping.add("http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A31", "Has_GDC_Value");
// Create Mapping from http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A32 to Is_Value_For_GDC_Property
CALL n10s.mapping.add("http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#A32", "Is_Value_For_GDC_Property");
// Add all mappings...
// Lastly... Import Thesaurus.owl
CALL n10s.rdf.import.fetch('file:///var/lib/neo4j/import/Thesaurus.owl', 'RDF/XML');
Now we can query the graph & see the transformation:
MATCH (x)-[r:Is_Value_For_GDC_Property]->(y)
WHERE x.NHC0 = 'C94214'
RETURN x, r, y;
Desired Result!:
I hope this is of help to you! Feel free to ping back if you need more help!
Best,
Rob
09-12-2022 08:09 AM
Hey @Rcolinp,
Thanks so much for this, Just working through it now, but it's exactly what I'm looking for.
Once again thanks for sharing your expertise, it's greatly appreciated.
Chris,
09-21-2022 09:52 AM
Hey @Rcolinp
Have been exploring the data a bit more closely. I have a further question if you don't mind me asking. In the https://evsexplore.nci.nih.gov/evsexplore/alldocs you will notice there is a 'Roles' section, is there a method to make these roles appear as 'relationships'.
For example
C30168---->gene_product_is_physical_part_of*---------->C17270
*Role - gene_product_is_physical_part_of = #R51
Relevant lines in the .owl file
C30168: Phosphatidylinositol 4,5-Bisphosphate 3-Kinase Catalytic Subunit Alpha Isoform
<!-- http://purl.obolibrary.org/obo/NCIT_C30168 -->
<owl:Class rdf:about=http://purl.obolibrary.org/obo/NCIT_C30168>
<owl:equivalentClass>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<rdf:Description rdf:about=http://purl.obolibrary.org/obo/NCIT_C16984/>
….
<owl:Restriction>
<owl:onProperty rdf:resource=http://purl.obolibrary.org/obo/NCIT_R51/> ## R51: gene_product_is_physical_part_of/complex_has_physical_part
<owl:someValuesFrom rdf:resource=http://purl.obolibrary.org/obo/NCIT_C17270/>
</owl:Restriction>
…
I did have a play round with the mapping options, but can't quite figure it out. I understand that an 'owl:Restriction' is a special kind of class, so I do wonder if that is a factor?
Thanks
Chris
02-03-2023 01:23 AM
Hey @ceiag
Apologies for delay! If no longer relevant to yourself, possibly I can at least leave an answer to help another person that stumbles upon this thread!
You are correct that owl:Restriction is a special class which is the reason you are having issues representing Roles as a relationshipType in Neo4j is simply how the NCI Thesaurus OWL Ontology represents its data. (note: You'll have this issue with all metadata except Associations)
In NCI Thesaurus OWL ontology, only Associations are persisted as relationshipTypes in Neo4j which furthermore can be mapped using n10s (as shown in the above solution). Reason being, in the OWL file Associations are represented as ObjectProperties while Roles and the rest of the metadata (to my knowledge) are not represented as ObjectProperties.
Associations persist from predicate to relationshipType upon import as they are represented as ObjectProperties, which are relationships between two Resources in the OWL ontology. The other predicates such as Roles are not represented as ObjectProperties, but as Restrictions (as you noticed). This is a special kind of class that defines conditions and/or constraints on the individuals or classes in the ontology. As consequence these will be mapped to properties of nodes in Neo4j by default.
In your example, the gene_product_is_physical_part_of
Role is represented as the R51
restriction. This restriction is used to define a constraint on the C30168
class (Phosphatidylinositol 4,5-Bisphosphate 3-Kinase Catalytic Subunit Alpha Isoform
) by specifying that it must have a relationship to the C17270
class through the R51
restriction, which represents the gene_product_is_physical_part_of
Role.
This is all just a consequence of how the data is represented in the OWL ontology. You could manipulate the structure of the OWL to transform Roles into ObjectProperties or modifying existing Restrictions to represent the Roles as ObjectProperties. That being said, there will be implications with such a modification on the overall structure and meaning of the data in the OWL ontology. I.e. impacting the validity of the ontology and/or change the meaning of the data altogether. It really all comes down and depends on your specific use-case. Depending on your use-case modifying the ontology in such a way might be the right fix. But I wouldn't out-right say that is the blanket solution/correct solution.
Hopefully this is of help and informative at the very least!
Best,
Rob
All the sessions of the conference are now available online