Neo4j

r_moquin · ‎04-17-2020

Hi Jesus,

I am new to Neo4j and since we use Neo4j 4.0 I'm seeing how we can integrate our RDF data into Neo4j.
So far I had one immediate piece of feedback (I am not sure if I'll have more, so I'm not waiting to aggregate them all into one feedback post, I hope that's ok). In your documentation, the following caution is mentioned:

"Make sure you know what you’re doing if you manipulate the prefix definition, especially after loading RDF data as you can overwrite namespaces in use, which would affect the possibility of regenerating the imported RDF."

If the intention is to be able to "regenerate the imported RDF", would it make sense to define prefix mappings when first encountered to match the RDF being loaded rather than automatically assigning a new prefix? For example, if your RDF had the following:

@prefix skosxl: <"http://www.w3.org/2008/05/skos-xl#">

If the namespace http://www.w3.org/2008/05/skos-xl# hasn't been mapped yet to a prefix, map it to skosxl as indicated in the @prefix statement? It a second set of RDF statements were imported which also defined a different prefix for the namenamespace, then you could just map it to the "discovered" prefix just like it is with the generated one. It feels like that would at least get you closer to being able to regenerate an imported RDF file. I find the generated prefixes when a prefix exists in an RDF file to be a bit confusing. Hopefully this suggestion is helpful.

Thanks!

Ryan

jesus_barrasa · ‎04-17-2020

Hi @r.moquin, thanks for your comment

While it sounds like a good idea a priori, it's not as easy as it looks. The namespace prefixes are local to a document (or to a serialisation to be more precise). Their only purpose is to make it more readable to humans (actually not all serialisations allow the definition of namespace prefixes!)

All this to say that when we parse an RDF document, all we read are triples where all resources are expanded as full uris and prefix definitions are lost. Let me give you an example.
I can carefully write this document using Turtle as serialisation format:

@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .

<https://myvoc.com/aConcept>
        a                 skos:Concept ;
        skos:broader      <https://myvoc.com/aBroaderConcept> ;
        skosxl:prefLabel  <https://myvoc.com/aConcept_prefLabel_es> .

<https://myvoc.com/aConcept_prefLabel_es>
        a                   skosxl:Label ;
        skosxl:literalForm  "un concepto"@es .

But when parsed, it becomes a set of triples formed of uris and literals (no trace of prefixes):

<https://myvoc.com/aConcept> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<https://myvoc.com/aConcept> <http://www.w3.org/2004/02/skos/core#broader> <https://myvoc.com/aBroaderConcept> .
<https://myvoc.com/aConcept> <http://www.w3.org/2008/05/skos-xl#prefLabel> <https://myvoc.com/aConcept_prefLabel_es> .
<https://myvoc.com/aConcept_prefLabel_es> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2008/05/skos-xl#Label> .
<https://myvoc.com/aConcept_prefLabel_es> <http://www.w3.org/2008/05/skos-xl#literalForm> "un concepto"@es .

Now, after the explanation, I have to say that I agree with you, that when using the shortened notation for URIs in Neo4j (skos__broader,skosxl__prefLabel,myvoc__Category1,...), it is very useful to be able to keep the namespaces prefixes that we're used to. That is why we added two methods that can be of help:

With n10s.nsprefixes.add you can define prefix-namespace pairs beforehand so when you import RDF using these namespaces, the predefined prefix will be used instead of having a sequential one generated dynamically by neosemantics.

call n10s.nsprefixes.add("skosxl","http://www.w3.org/2008/05/skos-xl#")

The problem with that is they have to be added one by one which can be tedious. Sometimes what we want to do is just grab all the definitions in our RDF document and have them added at once.
You can use for this the n10s.nsprefixes.addFromText procedure. And pass as parameter just a piece of text from your RDF document header. It does not matter if it is the header section of a Turtle, RDF/XML or JSON-LD document, the addFromText method will try its best to extract the prefix definitions in it and add them to the set defined in Neo4j so they are used when you import your RDF.

call n10s.nsprefixes.addFromText('
@prefix neo4voc: <http://neo4j.org/vocab/sw#> .
@prefix neo4ind: <http://neo4j.org/ind#> .
@prefix skosxl: <http://www.w3.org/2008/05/skos-xl#> .
')

Well, long explanation but I hope it will help you as well as other users in your situation.

Let me know your thoughts and please share your experience with neosemantics. It will help us make it better.

Cheers,

JB.

r_moquin · ‎04-20-2020

I agree on your points, it definitely all boils down to a set of tradeoffs.
You have to have something in the resource names to allow you to identify them, but having the full IRI is a pain in the butt to read. Doing what I mentioned above also involves a trade off like you mentioned. It does make perfect sense not to do the above for the reasons you mentioned. Thanks for your detailed explanation on how you view it.

I'll continue to provide feedback as I continue to use your library. I had always been curious on how others would approach loading RDF into a property graph store.

Ryan

Neo4j

Feedback on Neosemantics with Neo4j 4.0