Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-04-2018 02:00 PM
Hi,
I am working on a problem, where I have to create a node only when the value exists in the column and create no nodes otherwise for null values.
Should I use CREATE or MATCH?
My query:
LOAD CSV WITH HEADERS FROM "file:///Latest_Matrix.csv" AS line WITH line
WITH line
WHERE NOT line.Form Name Extension
IS NULL and NOT line.Study Title
is NULL
and NOT line.Domain Label
is NULL and NOT line.SDTM Variable Name
is NULL
MERGE (a: Study {name: line.Study Title
, type: 'Study'})
//CREATE (domain: Domain {name: line.Domain})
mergE (b: Domain {name: line.Domain Label
, type: 'SDTM Domain', domain_identifier: line.Study Title
+ line.Domain Label
})
Merge (e: Target_Variable {name: line.SDTM Variable Name
,type: 'SDTM Domain Variable', transformation:line.Transformation Code
, transformation_type:line.Transformation Type
,SDTM_Domain_Label:line.SDTM Variable Label
,
field_identifier: line.Study Title
+ line.Domain Label
+ line.SDTM Variable Name
})
CREATE (d: Source_Variable {name: line.Form Variable Name
,type: 'Source_Form_Variable', field_identifier: line.Study Title
+ line.Domain Label
})
Merge (c: Source_File {name: line.Form Name
,type: 'Source_Form_File', form_name_extension:line.Form Name Extension
, file_identifier: line.Study Title
+ line.Domain Label
+ line.Form Name
})
MERGE (e)-[q:Constitutes]->(b)
MERGE (b)-[p:Part_of]->(a)
MERGE (c)-[u:Has]->(d)
MERGE (d)-[u:transforms]->(e)
RETURN a,b,e;
I am confused on the part: CREATE (d: Source_Variable ......
12-04-2018 07:05 PM
Why wouldn't you just use MERGE? I find myself rarely using CREATE except on an initialization CYPHER script to create some placeholder nodes when re-building my DB cleanly (when I KNOW the node won't already exist).
MATCH would be appropriate if you aren't sure if it exists, and you want to bail out if the node doesn't already exist. Do you want the :Source_Variable node created if it doesn't already exist? If so, then go with MERGE.
Or are you concerned about "Form Variable Name" being null?
12-04-2018 09:56 PM
Thanks for your response Paul.
I am concerned about Form Variable Name" being null. I do not want to create a node when this value is null.
So suppose my column Form Variable Name" has values h,p,q,NULL,NULL,a , I want nodes to be created only for h,p,q,a and no nodes for NULL values.
I have two codes now:
This code does not give any output graph.
LOAD CSV WITH HEADERS FROM "file:///Latest_Matrix.csv" AS line WITH line
WITH line
WHERE NOT line.Form Name Extension
IS NULL and NOT line.Study Title
is NULL
and NOT line.Domain Label
is NULL and NOT line.SDTM Variable Name
is NULL
MERGE (a: Study {name: line.Study Title
, type: 'Study'})
//CREATE (domain: Domain {name: line.Domain})
mergE (b: Domain {name: line.Domain Label
, type: 'SDTM Domain', domain_identifier: line.Study Title
+ line.Domain Label
})
Merge (e: SDTM_Target_Variable {name: line.SDTM Variable Name
,type: 'SDTM Domain Variable', transformation:line.Transformation Code
, transformation_type:line.Transformation Type
,SDTM_Domain_Label:line.SDTM Variable Label
,
field_identifier: line.Study Title
+ line.Domain Label
+ line.SDTM Variable Name
})
Merge (c: Source_Form_File {name: line.Form Name
,type: 'Source_Form_File', form_name_extension:line.Form Name Extension
, file_identifier: line.Study Title
+ line.Domain Label
+ line.Form Name
})
WITH line
WHERE NOT line.Form Variable Name
IS NULL
MATCH (d: Source_Form_Variable {name: line.Form Variable Name
,type: 'Source_Form_Variable', field_identifier: line.Study Title
+ line.Domain Label
+ line.SDTM Variable Name
})
MERGE (c)-[u:Has]->(d)
MERGE (e)-[q:Constitutes]->(b)
MERGE (b)-[p:Part_of]->(a)
MERGE (d)-[g:transforms]->(e)
REMOVE b.domain_identifier
REMOVE e.field_identifier
REMOVE d.domain_identifier
REMOVE c.file_identifier
RETURN a,b,e;
This code creates named "Other" for all NULL values. Is there a way I could have just one node named "Other" and point all NULL to that?
LOAD CSV WITH HEADERS FROM "file:///Latest_Matrix.csv" AS line WITH line
WITH line
WHERE NOT line.Form Name Extension
IS NULL and NOT line.Study Title
is NULL
and NOT line.Domain Label
is NULL and NOT line.SDTM Variable Name
is NULL
MERGE (a: Study {name: line.Study Title
, type: 'Study'})
//CREATE (domain: Domain {name: line.Domain})
mergE (b: Domain {name: line.Domain Label
, type: 'SDTM Domain', domain_identifier: line.Study Title
+ line.Domain Label
})
Merge (e: SDTM_Target_Variable {name: line.SDTM Variable Name
,type: 'SDTM Domain Variable', transformation:line.Transformation Code
, transformation_type:line.Transformation Type
,SDTM_Domain_Label:line.SDTM Variable Label
,
field_identifier: line.Study Title
+ line.Domain Label
+ line.SDTM Variable Name
})
Merge (c: Source_Form_File {name: line.Form Name
,type: 'Source_Form_File', form_name_extension:line.Form Name Extension
, file_identifier: line.Study Title
+ line.Domain Label
+ line.Form Name
})
MATCH (d: Source_Form_Variable {name: coalesce(line.Form Variable Name
,"Other"),type: 'Source_Form_Variable', field_identifier: line.Study Title
+ line.Domain Label
+ line.SDTM Variable Name
})
MERGE (c)-[u:Has]->(d)
MERGE (e)-[q:Constitutes]->(b)
MERGE (b)-[p:Part_of]->(a)
MERGE (d)-[g:transforms]->(e)
REMOVE b.domain_identifier
REMOVE e.field_identifier
REMOVE d.domain_identifier
REMOVE c.file_identifier
RETURN a,b,e;
12-05-2018 12:00 AM
Replace this:
WITH line
WHERE NOT line.Form Variable Name IS NULL
MATCH (d: Source_Form_Variable {name: line.Form Variable Name,type: 'Source_Form_Variable', field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})
with
FOREACH(ignoreMe IN CASE WHEN line.Form Variable Name IS NOT NULL THEN [1] ELSE END |
CREATE (d: Source_Form_Variable {name: line.Form Variable Name, type: 'Source_Form_Variable', field_identifier: line.Study Title + line.Domain Label + line.SDTM Variable Name})like
)
You cannot use MERGE under FOREACH loop.
This way node, (d: Source_Form_Variable), will be created for non null values and no node will be created for null values.
-Kamal
12-05-2018 12:04 AM
FOREACH(ignoreMe IN CASE WHEN line.Form Variable Name IS NOT NULL THEN [1] ELSE END |
Here after ELSE two square brackets. Somehow it's appearing as a box when I post it here.
-Kamal
12-05-2018 07:28 AM
Hi Kamal,
I tried what you told. It gives me one null node for source variable and everything points to that. Even for the one where (d: Source_Form_Variable) is not null, it still doesn't create new nodes and points it to the empty node.
My code:
LOAD CSV WITH HEADERS FROM "file:///Latest_Matrix.csv" AS line WITH line
WITH line
WHERE NOT line.Form Name Extension
IS NULL and NOT line.Study Title
is NULL
and NOT line.Domain Label
is NULL and NOT line.SDTM Variable Name
is NULL
MERGE (a: Study {name: line.Study Title
, type: 'Study'})
//CREATE (domain: Domain {name: line.Domain})
mergE (b: Domain {name: line.Domain Label
, type: 'SDTM Domain', domain_identifier: line.Study Title
+ line.Domain Label
})
Merge (e: SDTM_Target_Variable {name: line.SDTM Variable Name
,type: 'SDTM Domain Variable', transformation:line.Transformation Code
, transformation_type:line.Transformation Type
,SDTM_Domain_Label:line.SDTM Variable Label
,
field_identifier: line.Study Title
+ line.Domain Label
+ line.SDTM Variable Name
})
Merge (c: Source_Form_File {name: line.Form Name
,type: 'Source_Form_File', form_name_extension:line.Form Name Extension
, file_identifier: line.Study Title
+ line.Domain Label
+ line.Form Name
})
FOREACH(ignoreMe IN CASE WHEN line.Form Variable Name
IS NOT NULL THEN [1] ELSE END |
CREATE (d: Source_Form_Variable {name: line.`Form Variable Name`,type: 'Source_Form_Variable', field_identifier: line.`Study Title` + line.`Domain Label` + line.`SDTM Variable Name`})
)
MERGE (c)-[u:Has]->(d)
MERGE (e)-[q:Constitutes]->(b)
MERGE (b)-[p:Part_of]->(a)
MERGE (d)-[g:transforms]->(e)
REMOVE b.domain_identifier
REMOVE e.field_identifier
REMOVE d.domain_identifier
REMOVE c.file_identifier
RETURN a,b,e;
12-05-2018 09:35 AM
Post one line of your .csv file to check the failure point.
-Kamal
12-05-2018 09:45 AM
Matrix ID | Study Title | Domain Name | Domain Label | Domain Name Extension | Sub Domain Label | Form Name | Form Label | Form Name Extension | Form Variable Name | Form Variable Label | SDTM Variable Name | SDTM Variable Label | Join Criteria | Transformation Type | Transformation Pseudo Code | Transformation Code |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Study123 | DM | Demographics | DM.csv | None | DM | Demographics | DM.CSV | PROJECT | STUDYID | Study Identifier | No Transformation | No Transformation | |||
2 | Study123 | DM | Demographics | DM.csv | None | DM | Demographics | DM.CSV | DOMAIN | Domain Abbreviation | Manual Entry | Text | "DM" |
12-05-2018 09:53 AM
Check to see if Form Variable Name is enclosed between back ticks in FOREACH statement. I might have missed it in my example.
12-05-2018 10:01 AM
Thanks. Form Variable Name column shows blank in second row. In this case you should use
FOREACH(ignoreMe IN CASE WHEN line.Form Variable Name
<> "" THEN [1] ELSE END |
12-05-2018 10:09 AM
Tried that. So the problem, it creates a null node and points everything to that.
I want line.Form Variable name
node "PROJECT" created and it should be related to line.Form Name
. But when we encounter empty cell for Form Variable name
it shouldn't create a node at all.
12-05-2018 02:13 PM
Here is the code that worked for me.
Replace part of your code with this:
FOREACH(ignoreMe IN CASE WHEN line.Form Variable Name IS NOT NULL THEN [1] ELSE END |
MERGE (d: Source_Form_Variable {name: line.Form Variable Name
,type: 'Source_Form_Variable', field_identifier: line.Study Title
+ line.Domain Label
+ line.SDTM Variable Name
})
MERGE (c)-[u:Has]->(d)
MERGE (d)-[g:transforms]->(e)
)
MERGE (e)-[q:Constitutes]->(b)
MERGE (b)-[p:Part_of]->(a)
MERGE seems to be working now under FOREACH.
-Kamal
All the sessions of the conference are now available online