Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-05-2022 12:03 AM - edited 08-05-2022 12:04 AM
Hi,
I want to create a database from a CSV file using LOAD CSV. A column called Problem contains a unique value in each cell. So I'm going to generate a node from each one of the values, belonging to the p: product label.
Two other columns -Patient and Medication- may have a unique value or multiple values in each of their cells (and, sometimes, are empty).
So, I want to split cells containing multiple values in Problem Column in order to generate separated nodes. Nodes from the column patient will belong to the label p.patient, and nodes from the Medication column will belong to the label m.medication.
Test Data: Drive Link
I am successful with this cypher query to load and transform data except splitting the row which has multi value.
CREATE CONSTRAINT FOR (p:Patient) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT FOR (p:Problem) REQUIRE p.name IS UNIQUE;
CREATE CONSTRAINT FOR (t:Test) REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT FOR (m:Medication) REQUIRE m.name IS UNIQUE;
// Patient id, Problem, Test, Medication.
LOAD CSV WITH HEADERS FROM
'file:///tathasthu1.csv' AS row
WITH row
CREATE (patient:Patient {id: row.Patient})
MERGE (problem:Problem {name: row.Problem})
MERGE (test:Test {name: row.Test})
MERGE (medication:Medication {name: row.Medication})
CREATE (patient)-[:HAS]->(problem)
CREATE (patient)-[:PERFORMED]->(test)
CREATE (test)-[:TAKEN_FOR]->(problem)
CREATE (patient)-[:TAKES]->(medication)
CREATE (medication)-[:CAN_TREAT]->(problem)
;
--
CREATE CONSTRAINT FOR (s:Subproblem) REQUIRE s.name is UNIQUE;
LOAD CSV WITH HEADERS FROM
'file:///tathasthu1.csv' AS row
WITH row WHERE NOT row.Subproblem IS null
MATCH (problem:Problem {name: row.Problem})
MATCH (medication:Medication {name: row.Medication})
MERGE(subproblem:Subproblem {name: row.Subproblem})
CREATE (subproblem)-[:IN_CATEGORY]->(problem)
CREATE (medication)-[:SIDE_EFFECT]->(subproblem)
SET subproblem.name = row.Subproblem
Now I have to split Glaucoma and Chronic Pain value row into two nodes, because as per test data, vision loss is due to chronic pain but not to Glaucoma as mentioned in the Details column.
Help is appreciated. Thank you.
Solved! Go to Solution.
08-05-2022 01:20 AM
These queries should do what you want:
// Create constraints
CREATE CONSTRAINT FOR (p:Patient) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT FOR (p:Problem) REQUIRE p.name IS UNIQUE;
CREATE CONSTRAINT FOR (t:Test) REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT FOR (m:Medication) REQUIRE m.name IS UNIQUE;
CREATE CONSTRAINT FOR (s:Subproblem) REQUIRE s.name is UNIQUE;
// Create nodes
LOAD CSV WITH HEADERS FROM 'file:///tathasthu1.csv' AS row
WITH row
MERGE (:Patient {id: row.Patient})
MERGE (:Test {name: row.Test})
MERGE (:Medication {name: row.Medication})
WITH row
CALL apoc.do.when(row.Subproblem IS NOT NULL,
'MERGE (:Subproblem {name: row.Subproblem}) RETURN row',
'RETURN row',
{row: row}
)
YIELD value
WITH split(row.Problem, ", ") AS problems
UNWIND problems AS problem
MERGE (:Problem {name: problem});
// Create relationships
LOAD CSV WITH HEADERS FROM 'file:///tathasthu1.csv' AS row
WITH row
MATCH (p:Patient {id: row.Patient})
MATCH (t:Test {name: row.Test})
MATCH (m:Medication {name: row.Medication})
CREATE (p)-[:PERFORMED]->(t)
CREATE (p)-[:TAKES]->(m)
WITH row, p, t, m
CALL apoc.do.when(row.Subproblem IS NOT NULL,
'MATCH (s:Subproblem {name: row.Subproblem}) CREATE (m)-[:SIDE_EFFECT]->(s) RETURN s',
'RETURN null AS s',
{row: row, m: m}
)
YIELD value
WITH split(row.Problem, ", ") AS problems, value.s AS subproblem, p, t, m
UNWIND problems AS problem
MATCH (prob:Problem {name: problem})
CREATE (p)-[:HAS]->(prob)
CREATE (t)-[:TAKEN_FOR]->(prob)
CREATE (m)-[:CAN_TREAT]->(prob)
WITH prob, subproblem
WHERE subproblem IS NOT NULL
CREATE (subproblem)-[:IN_CATEGORY]->(prob);
Regards,
Cobra
08-05-2022 12:17 AM
Hello 😄
In your example, which cell in your CSV has multiple values?
Regards,
Cobra
08-05-2022 12:31 AM
Patient B0021 with Problem. Cell number: B2
08-05-2022 12:59 AM
All columns except Patient one can have multiple values?
08-05-2022 01:20 AM
These queries should do what you want:
// Create constraints
CREATE CONSTRAINT FOR (p:Patient) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT FOR (p:Problem) REQUIRE p.name IS UNIQUE;
CREATE CONSTRAINT FOR (t:Test) REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT FOR (m:Medication) REQUIRE m.name IS UNIQUE;
CREATE CONSTRAINT FOR (s:Subproblem) REQUIRE s.name is UNIQUE;
// Create nodes
LOAD CSV WITH HEADERS FROM 'file:///tathasthu1.csv' AS row
WITH row
MERGE (:Patient {id: row.Patient})
MERGE (:Test {name: row.Test})
MERGE (:Medication {name: row.Medication})
WITH row
CALL apoc.do.when(row.Subproblem IS NOT NULL,
'MERGE (:Subproblem {name: row.Subproblem}) RETURN row',
'RETURN row',
{row: row}
)
YIELD value
WITH split(row.Problem, ", ") AS problems
UNWIND problems AS problem
MERGE (:Problem {name: problem});
// Create relationships
LOAD CSV WITH HEADERS FROM 'file:///tathasthu1.csv' AS row
WITH row
MATCH (p:Patient {id: row.Patient})
MATCH (t:Test {name: row.Test})
MATCH (m:Medication {name: row.Medication})
CREATE (p)-[:PERFORMED]->(t)
CREATE (p)-[:TAKES]->(m)
WITH row, p, t, m
CALL apoc.do.when(row.Subproblem IS NOT NULL,
'MATCH (s:Subproblem {name: row.Subproblem}) CREATE (m)-[:SIDE_EFFECT]->(s) RETURN s',
'RETURN null AS s',
{row: row, m: m}
)
YIELD value
WITH split(row.Problem, ", ") AS problems, value.s AS subproblem, p, t, m
UNWIND problems AS problem
MATCH (prob:Problem {name: problem})
CREATE (p)-[:HAS]->(prob)
CREATE (t)-[:TAKEN_FOR]->(prob)
CREATE (m)-[:CAN_TREAT]->(prob)
WITH prob, subproblem
WHERE subproblem IS NOT NULL
CREATE (subproblem)-[:IN_CATEGORY]->(prob);
Regards,
Cobra
08-05-2022 01:40 AM
I am getting this error!
08-05-2022 02:14 AM - edited 08-05-2022 02:19 AM
Thank you so much. This worked.
Could you help me to understand the following:
1. Why is yield used?
2. How cypher is reading prop as problem because not in any clause you mentioned the same.
MATCH (prob:Problem {name: problem})
3. If i want the subproblem to be connected to one problem, but to both split values, how to do that!
Thanks.
08-05-2022 02:22 AM
08-05-2022 02:28 AM - edited 08-05-2022 02:31 AM
Thank you.
3. If i want the subproblem to be connected to one problem, but not to both split values, then may I know how to do that?
08-05-2022 02:42 AM
Do you have a rule that works all the time? For example always connect to the first problem.
Otherwise, I will need more details about your use case.
08-05-2022 06:06 AM
No there no such rule. It depends on the problem and subproblem!
Kindly look at the following image, here you see Glaucoma doesn't have a relationship to sub problem, but chronic pain does have a relationship to sub problem.
So, now with the Split, the values in the problem row are separated. But, what if I want to connect one value from the same cell to the sub problem but not the other one!
08-05-2022 06:27 AM
Well I think you will have create 2 CSVs, one for nodes and the other one for relationships. Since it's specific, you will have to define them before to load them in the database.
08-08-2022 04:54 AM
Instead of 2 CSV's I have created a separate column for each and added more rows. Now in this file (FILE), you can see there are one or two null values in each column.
// Create constraints
CREATE CONSTRAINT FOR (p:Patient) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT FOR (d:Disease) REQUIRE d.name IS UNIQUE;
CREATE CONSTRAINT FOR (t:Test) REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT FOR (m:Medication) REQUIRE m.name IS UNIQUE;
CREATE CONSTRAINT FOR (s:Sub_disease) REQUIRE s.name is UNIQUE;
CREATE CONSTRAINT FOR (dm:Medication_details) REQUIRE dm.name is UNIQUE;
CREATE CONSTRAINT FOR (td:Test_details) REQUIRE td.name is UNIQUE;
// Create nodes
LOAD CSV WITH HEADERS FROM 'file:///healthcare.csv' AS row
WITH row
MERGE (:Patient {id: row.Patient})
MERGE (:Test {name: row.Test})
MERGE (:Medication {name: row.Medication})
MERGE (:Disease {name: row.Disease});
WITH row
CALL apoc.do.when(row.Sub_disease IS NOT NULL,
'MERGE (:Sub_disease {name: row.Sub_disease}) RETURN row',
'RETURN row',
{row: row}
CALL apoc.do.when(row.Detailmed is NOT NULL,
'MERGE (:Detailmed {name: row.Medical_details}) RETURN row',
'RETURN row',
{row: row}
CALL apoc.do.when(row.Test_details is NOT NULL,
'MERGE (:Test_details {name: row.Test_details}) RETURN row',
'RETURN row',
{row: row}
)
YIELD value
// Create relationships
LOAD CSV WITH HEADERS FROM 'file:///healthcare.csv' AS row
WITH row
MATCH (p:Patient {id: row.Patient})
MATCH (t:Test {name: row.Test})
MATCH (m:Medication {name: row.Medication})
MATCH (dm:Medical_details {name: row.Medica_details})
MATCH (tm:Test_detials {name: row.Test_details})
CREATE (p)-[:HAS]->(d)
CREATE (p)-[:PERFORMED]->(t)
CREATE (p)-[:TAKES]->(m)
WITH row, p, t, m,
CALL apoc.do.when(row.Sub_disease IS NOT NULL,
'MATCH (s:Sub_disease {name: row.Sub_diease}) CREATE (s)-[:CAN_BE_ASSOCIATED_WITH]->(d) RETURN s',
'RETURN null AS s',
{row: row, d: d}
CALL apoc.do.when(row.Medical_detials is NOT NULL,
'MATCH (:Medical_details {name: row.Medical_detials}) CREATE (m)-[:IS_A]->(dm) RETURN dm',
'RETURN null AS dm',
{row: row, m: m}
CALL apoc.do.when(row.Test_details is NOT NULL,
'MATCH (:Testdetails {name: row.Test_details}) CREATE (td)-[:IS_A_TEST_OF]->(t) RETURN td',
'RETURN null AS td',
{row: row, td: td}
)
YIELD value
CREATE (t)-[:TAKEN_FOR]->(d)
CREATE (m)-[:CAN_TREAT]->(d)
;
I have shared example of one patient about how I want to connect.
Following is the error I am getting,
Thanks in advance.
08-08-2022 04:57 AM
I guess you have a problem of copy-paste. The beginning of the query is missing.
08-08-2022 05:17 AM
I have the one I pasted. I think I am missing something in my query!
08-08-2022 06:15 AM
I will see but I cannot access your new file
08-09-2022 12:19 AM
There are a lot of typos in your queries and column names are not corresponding to what you write in the queries... You could at least have a look at the documentation...
08-09-2022 10:10 PM
I changed the column name while sharing it with you. Sometimes I am getting errors when there is a gap between two words. And wanted to know how avoid that error.
08-09-2022 01:15 AM
// Create constraints
CREATE CONSTRAINT FOR (p:Patient) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT FOR (d:Disease) REQUIRE d.name IS UNIQUE;
CREATE CONSTRAINT FOR (t:Test) REQUIRE t.name IS UNIQUE;
CREATE CONSTRAINT FOR (m:Medication) REQUIRE m.name IS UNIQUE;
CREATE CONSTRAINT FOR (s:SubDisease) REQUIRE s.name is UNIQUE;
CREATE CONSTRAINT FOR (dm:MedicationDetails) REQUIRE dm.name is UNIQUE;
CREATE CONSTRAINT FOR (td:TestDetails) REQUIRE td.name is UNIQUE;
// Create nodes
LOAD CSV WITH HEADERS FROM 'file:///healthcare.csv' AS row
WITH row
MERGE (:Patient {id: row.Patient})
MERGE (:Test {name: row.Test})
MERGE (:Medication {name: row.Medication})
MERGE (:Disease {name: row.Disease})
WITH row
CALL apoc.do.when(
row.Subdisease IS NOT NULL,
'MERGE (:SubDisease {name: row.Subdisease}) RETURN row',
'RETURN row',
{row: row}
)
YIELD value
WITH row
CALL apoc.do.when(
row.Detailmed IS NOT NULL,
'MERGE (:MedicationDetails {name: row.Detailmed}) RETURN row',
'RETURN row',
{row: row}
)
YIELD value
WITH row
CALL apoc.do.when(
row.Testdetails IS NOT NULL,
'MERGE (:TestDetails {name: row.Testdetails}) RETURN row',
'RETURN row',
{row: row}
)
YIELD value
RETURN row;
// Create relationships
LOAD CSV WITH HEADERS FROM 'file:///healthcare.csv' AS row
WITH row
MATCH (p:Patient {id: row.Patient})
MATCH (t:Test {name: row.Test})
MATCH (m:Medication {name: row.Medication})
MATCH (d:Disease {name: row.Disease})
MERGE (p)-[:PERFORMED]->(t)
MERGE (p)-[:TAKES]->(m)
MERGE (p)-[:HAS]->(d)
MERGE (t)-[:TAKEN_FOR]->(d)
MERGE (m)-[:CAN_TREAT]->(d)
WITH row, p, t, m, d
CALL apoc.do.when(
row.Detailmed IS NOT NULL,
'MATCH (n:MedicationDetails {name: row.Detailmed}) MERGE (m)-[:IS_A]->(n) RETURN row',
'RETURN row',
{row: row, m: m}
)
YIELD value
WITH row, p, t, m, d
CALL apoc.do.when(
row.Testdetails IS NOT NULL,
'MATCH (n:TestDetails {name: row.Testdetails}) MERGE (n)-[:IS_A_TEST_OF]->(t) RETURN row',
'RETURN row',
{row: row, t: t}
)
YIELD value
WITH row, p, t, m, d
CALL apoc.do.when(
row.Subdisease IS NOT NULL,
'MATCH (s:SubDisease {name: row.Subdisease}) MERGE (s)-[:CAN_BE_ASSOCIATED_WITH]->(d) RETURN row',
'RETURN row',
{row: row, d: d}
)
YIELD value
RETURN row;
Regards,
Cobra
08-09-2022 10:12 PM
Thank you so much.
So, here apart from the column name,
My mistake was not adding WITH clause before every Apoc.do.when? And
08-10-2022 12:07 AM
There was a ; in the middle of the query, WITH clauses were missing, end brackets of apoc.do.when() were missing, column names were different...
08-10-2022 02:33 AM
Thank you Cobra.
All the sessions of the conference are now available online