Neo4j

yogtha88 · ‎02-05-2020

I have following csv file to be read from

name,parent,resourceType,title,discription

technique,,,,trhrhrh

workflow,,,,trhrhrh

No-bake,technique,,,fgfg

No-bake,workflow,,,fgfg

sand,,,,trhrthr

Out of this i neeed to create graph of parent child with language and external reference. # different types of label exists Taxonomy, TaxonomyLanguage, and ExternalReference i have below query which is creating single child under same parent instead od this 2 seperate parents should be created and atatched with IS_CHILD_OF relationship to parent. example only single No-Bake node is getting created instead of 2 seperate nodes.

andrew_bowman · ‎02-05-2020

Okay, let's go through that query.

MERGE (language:TaxonomyLanguage { name: line.name})
ON MATCH SET language.name = line.name
ON CREATE SET language.name = line.name

This will look for an existing :TaxonomyLanguage node with the given name, and if it doesn't exist, create it.
Since you're already merging the node by its name property, the property is already set, and neither of those followup ON MATCH SET or ON CREATE SET are needed, you can drop both of those.

MERGE (parentLanguage:TaxonomyLanguage { name: coalesce(line.parent,"receipe")}) //assign root node name here from database

This finds an existing :TaxonomyLanguage node with the given name from line.parent (or uses "receipe" for the name if it doesn't exist).

MERGE (language)-[:IN_Language]->(t1:Taxonomy)

You merge a relationship [:IN_Language] between the language node and some :Taxonomy node. Note that t1 hasn't been used yet, this is a brand new variable, so this will match to a :Taxonomy node if such a pattern already exists, otherwise it will create a new relationship to a brand new :Taxonomy node (with no proeperties), and I'm not sure if this is doing what you want.

MERGE (t1)-[:HAS_EXTERNAL_REF]->(:ExternalReference{source:"AEM", externalId:"23423d42", lastUpdated: datetime()})

From that potentially new t1:Taxonomy node, you MERGE this pattern. Note that if such a pattern doesn't exist, it may create a brand new :ExternalReference node with those properties. If there is already such a node in your graph (but not attached to the t1 node) it won't be used since you didn't MATCH to it first.

MERGE (parentLanguage)-[:IN_Language]->(t2:Taxonomy)

Here you are first checking if the parentLanguage node from earlier is attached to a :Taxonomy node by the given relationship. If such a pattern doesn't already exist in your graph, then a brand new :Taxonomy node (with no properties) will be created and used for this pattern.

MERGE (t1)-[:IS_CHILD_OF]->(t2)

This will merge that relationship between the two (potentially newly-created) t1 and t2 nodes.

Anything wrong with this so far?

If you wanted two separate No-bake nodes, one attached to each parent, then you shouldn't MERGE on language.

Instead you should MERGE the parent first, and then MERGE the language node into the parent along with the relationship:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///testtagfinal.csv' AS line
MERGE (parentLanguage:TaxonomyLanguage { name: coalesce(line.parent,"receipe")}) //assign root node name here from database
MERGE (parentLanguage)-[:IN_Language]->(t2:Taxonomy)
MERGE (t1:Taxonomy)-[:IS_CHILD_OF]->(t2)
MERGE (language:TaxonomyLanguage { name: line.name})-[:IN_Language]->(t1)
MERGE (t1)-[:HAS_EXTERNAL_REF]->(:ExternalReference{source:"AEM", externalId:"23423d42", lastUpdated: datetime()})

This starts with the parentLanguage, merging that if it doesn't already exist, merging that to its :Taxonomy (creating if it doesn't already exist), matching to or creating a child :Taxonomy from the parent taxonomy, and then merging the language of the given name that must be in that child taxonomy.

So looking at the No-bake lines in the CSV, we're starting from two parent languages, technique and workflow, finding their taxonomy languages, and the children of those, then for each of those child :Taxonomy nodes merging the pattern to No-bake :TaxonomyLanguage nodes. Since we're merging the pattern from t1, and not merging language alone, a new node will be created for each.

yogtha88 · ‎02-05-2020

your query is giving me 3 seperate graphs

andrew_bowman · ‎02-05-2020

Can you verify if there is existing data in your graph, or if this is all new?

yogtha88 · ‎02-05-2020

here is existing node (i just have receipe node)

andrew_bowman · ‎02-05-2020

Ah, okay.

So, this is a problem with your data, because the order of ingestion and whether parents or children are being processed first will lead to different graphs being created.

That is, if we create the parents first, and assume that each child is unique for the parent, we can get in trouble.

That's the approach I used: MERGE the parents, and then per parent, merge/create a child for that parent.

So first receipe and technique and workflow are merged. Then for each of those, a new child is created:

receipe will get a technique and a workflow and a sand child, but these are not the same nodes that we merged as parents, and thus the reason why we're getting different graphs.

In your approach, the children are merged first, and since you're using MERGE you will never get duplicate nodes for a given name...there will be only one No-bake node, which is not what you want.

You could change the MERGE to a CREATE on the language node, and get rid of the ON CREATE and ON MATCH clauses, and this should get you what you want:

LOAD CSV WITH HEADERS FROM 'file:///testtagfinal.csv' AS line
create (language:TaxonomyLanguage { name: line.name})
MERGE (parentLanguage:TaxonomyLanguage { name: coalesce(line.parent,"receipe")}) //assign root node name here from database
MERGE (language)-[:IN_Language]->(t1:Taxonomy)
MERGE (t1)-[:HAS_EXTERNAL_REF]->(:ExternalReference{source:"AEM", externalId:"23423d42", lastUpdated: datetime()})
MERGE (parentLanguage)-[:IN_Language]->(t2:Taxonomy)
MERGE (t1)-[:IS_CHILD_OF]->(t2)

However, while this may work with this test data set, it may not work with your full data set. You need to make sure that every language node here is meant to be unique, and more importantly, that the children you're adding should be added to all matches to your parent nodes.

That is, if you have data to load nodes as children of the No-bake nodes, different children for each, your query might not properly capture this.

See what this looks like on a larger data set.

yogtha88 · ‎02-05-2020

this should be my desired output

yogtha88 · ‎02-05-2020

wow this works! thanks sir.
just one problem i dont want to create duplicate childrens under same parent for example if my csv contain 2 No-bake under technique (same heirachy) then i simple want to ignore it(should not create duplicate node under same heirachy). But should allow to create duplicate node under different heirachy. for example can create No-bake under technique and workflow but should not allow to create 2 no-bake under technique

name,parent,resourceType,title,discription
technique,,,,trhrhrh
workflow,,,,trhrhrh
No-bake,technique,,,fgfg
No-bake,technique,,,fgfg
No-bake,workflow,,,fgfg
sand,,,,trhrthr

pl

yogtha88 · ‎02-05-2020

business case virus under animal has different context than virus under disease but can have 2 viruses nodes in graph which will help to search according to context

andrew_bowman · ‎02-13-2020

Since we're using MERGEs for the query, there shouldn't be duplicates created. Give it a try and let me know.

My concern is for how deep this hierarchy goes. If it's only these levels, then you might be okay. But if your graph hierarchy is any deeper, then you have a problem with your input data, in that you don't have enough information to specify the unique nodes you want to work with.

It's similar to a directory structure. It's possible to have several directories of the same name, but in different places in the hierarchy, and if you don't have enough information to uniquely identify the ones that you want to create children for, then you'll be adding children to some or all of them at once. You may need a way to represent in your import data the path/hierachy to the parent node in question, use that to match to the exact node you want, then continue with the query.

Neo4j

Neo4j graph creates single child under different parents