Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-22-2022 11:15 PM
I am trying to convert this long SQL query to cypher. I have two types of node: patient and apachepatientresult. Both having a common relation using node property as patientunitstayid. How can I achieve this?. Please help
drop materialized view if exists labels cascade;
create materialized view labels as
-- select all the data we need from the apache predictions table, plus patient identifier and hospital identifier
-- information because we only want to select one episode per patient (more on this later)
with all_labels as (
select p.uniquepid, p.patienthealthsystemstayid, apr.patientunitstayid, p.unitvisitnumber,
apr.predictedhospitalmortality, apr.actualhospitalmortality, apr.predictediculos, apr.actualiculos
from patient as p
inner join apachepatientresult as apr
on p.patientunitstayid = apr.patientunitstayid
-- only use the most recent apache prediction model and exclude anyone who doesn't have at least 24 hours of data
where apr.apacheversion = 'IVa' and apr.actualiculos >= 1
)
select al.patientunitstayid, al.predictedhospitalmortality, al.actualhospitalmortality,
al.predictediculos, al.actualiculos
from all_labels as al
-- 'selection' is a table which will choose a random hospital stay (the lowest number is fine because the stays
-- are randomly ordered). In the case of multiple ICU stays within that hospital admission, it will choose the
-- first ICU stay that satisfies the 24 hours of data requirement. The rationale is that the model should be
-- applied as soon as there is 24 hours of continuous data within the hospital. This query extracts 89143 stays.
inner join (
select p.uniquepid, p.patienthealthsystemstayid, min(p.unitvisitnumber) as unitvisitnumber
from patient as p
inner join (
select uniquepid, min(patienthealthsystemstayid) as patienthealthsystemstayid
from all_labels
group by uniquepid
) as intermediate_selection
on p.patienthealthsystemstayid = intermediate_selection.patienthealthsystemstayid
group by p.uniquepid, p.patienthealthsystemstayid
) as selection
on al.patienthealthsystemstayid = selection.patienthealthsystemstayid
and al.unitvisitnumber = selection.unitvisitnumber;
I
Solved! Go to Solution.
07-24-2022 07:00 AM
I don't think there is a way around matching the patient nodes three times. The following query is the same, but breaking out the part that calculates the 'patienthealthsystemstayid' and 'unitvisitnumber' into a 'call' subquery makes it more understandable.
call {
match(p:patient)-[:HAS_RESULT]->(apr:apachepatientresult)
where apr.apacheversion = 'IVa' and apr.actualiculos >= 1
with p.uniquepid as uniquepid, min(p.patienthealthsystemstayid) as patienthealthsystemstayid
match(p:patient)
where p.patienthealthsystemstayid = patienthealthsystemstayid
with p.uniquepid as uniquepid, patienthealthsystemstayid, min(p.unitvisitnumber) as unitvisitnumber
return patienthealthsystemstayid, unitvisitnumber
}
match(p:patient)-[:HAS_RESULT]->(apr:apachepatientresult)
where apr.apacheversion = 'IVa' and apr.actualiculos >= 1
and p.patienthealthsystemstayid = patienthealthsystemstayid
and p.unitvisitnumber = unitvisitnumber
return p
By the way, I believe your original algorithm can produce more than one episode per patient. This can happen when a patient has the same combination of patienthealthsystemstayid and unitvisitnumber that is the same as another patient's minimum combination, but is not the same as this patient's combination. I refactored the above query to eliminate that possibility by returning only one record per patient which is the one that has their minimum combination of patienthealthsystemstayid and unitvisitnumber.
match(p:patient)-[:HAS_RESULT]->(apr:apachepatientresult)
where apr.apacheversion = 'IVa' and apr.actualiculos >= 1
with p.uniquepid as uniquepid, min(p.patienthealthsystemstayid) as patienthealthsystemstayid
call {
with uniquepid, patienthealthsystemstayid
match(p:patient)
where p.uniquepid = uniquepid and p.patienthealthsystemstayid = patienthealthsystemstayid
return min(p.unitvisitnumber) as unitvisitnumber
}
match(p:patient)
where p.uniquepid = uniquepid
and p.patienthealthsystemstayid = patienthealthsystemstayid
and p.unitvisitnumber = unitvisitnumber
return p
07-23-2022 06:35 AM
I am assuming you have a data model similar to this: 'match(p:Patient)-[:HAS_RESULT]->(apr:apachepatientresult)'
apr.apacheversion = 'IVa' and apr.actualiculos >= 1
The actual query is selecting from the above population each patient's row that corresponds to the patient's row that has the minimum 'patienthealthsystemstayid' and minimum 'unitvisitnumber'. If the 'patient' table represents, there would be only one record each, and these minimums would be the single values. As such, I am think the 'patient' table does not represent the patient information, but does it contain the patient's encounter information, and the other table is the results from each encounter?
Can you provide the relevant part of the database schema, i.e. these two tables?
07-23-2022 07:21 AM
07-23-2022 07:37 AM
Does the 'patent' table in sql represent a patient, i.e. one record per patient, or is it like a patient encounter table and the uniquepid is a foreign key that references the patient information? This would make more sense from the sql query. My confusing is how the query is finding the min values if it was a patient table. I can see how it would if 'patient' is really an encounter table, thus there would be multiple records in the 'patient' table for one real patient represented by the patient's uniquepid. Is this correct? If so, do you have another neo4j node that has the patient demographics?
07-23-2022 08:05 AM
The 'patient' table in sql does not represent a patient per record, but rather multiple stays of every patient in a hospital. Here is the link of the actual schema which I am following. https://eicu-crd.mit.edu/eicutables/patient/ . 'patientUnitStayID' is the PK and 'patientHealthSystemStayID' is the FK. I think my graph model for this sql query is wrong. I have designed each node as a patient stay that has all the demographic information of the patient. And for your last question only patient table has all the demographics. You can visit the link for more info.
Thanks
07-23-2022 08:16 AM
This is making a lot more sense. Maybe you shoukd relabel your Patient node to PatientVisit or something similar to distinguish it from the Patient.
Is ‘patientHealthSystemStayID’ a foreign key to health provider table?
I will review the schema. I think I can write the query now.
07-23-2022 09:46 AM
'patientHealthSystemStayID' is connected to the apacheApsVar table. I will recreate the model having patient node with the uniquepid, and seperate node for the PatientVisit. Thank you so much. But what would be the query then?
07-23-2022 03:33 PM
I want to review it some more and look for optimizations, but you can give it a try to see what results you get.
match(p:patient)-[:HAS_RESULT]->(apr:apachepatientresult)
where apr.apacheversion = 'IVa' and apr.actualiculos >= 1
with p.uniquepid as uniquepid, min(p.patienthealthsystemstayid) as patienthealthsystemstayid
with uniquepid, patienthealthsystemstayid
match(p:patient)
where p.patienthealthsystemstayid = patienthealthsystemstayid
with p.uniquepid as uniquepid, patienthealthsystemstayid, min(p.unitvisitnumber) as unitvisitnumber
with patienthealthsystemstayid, unitvisitnumber
match(p:patient)-[:HAS_RESULT]->(apr:apachepatientresult)
where apr.apacheversion = 'IVa' and apr.actualiculos >= 1
and p.patienthealthsystemstayid = patienthealthsystemstayid
and p.unitvisitnumber = unitvisitnumber
return p
07-24-2022 07:00 AM
I don't think there is a way around matching the patient nodes three times. The following query is the same, but breaking out the part that calculates the 'patienthealthsystemstayid' and 'unitvisitnumber' into a 'call' subquery makes it more understandable.
call {
match(p:patient)-[:HAS_RESULT]->(apr:apachepatientresult)
where apr.apacheversion = 'IVa' and apr.actualiculos >= 1
with p.uniquepid as uniquepid, min(p.patienthealthsystemstayid) as patienthealthsystemstayid
match(p:patient)
where p.patienthealthsystemstayid = patienthealthsystemstayid
with p.uniquepid as uniquepid, patienthealthsystemstayid, min(p.unitvisitnumber) as unitvisitnumber
return patienthealthsystemstayid, unitvisitnumber
}
match(p:patient)-[:HAS_RESULT]->(apr:apachepatientresult)
where apr.apacheversion = 'IVa' and apr.actualiculos >= 1
and p.patienthealthsystemstayid = patienthealthsystemstayid
and p.unitvisitnumber = unitvisitnumber
return p
By the way, I believe your original algorithm can produce more than one episode per patient. This can happen when a patient has the same combination of patienthealthsystemstayid and unitvisitnumber that is the same as another patient's minimum combination, but is not the same as this patient's combination. I refactored the above query to eliminate that possibility by returning only one record per patient which is the one that has their minimum combination of patienthealthsystemstayid and unitvisitnumber.
match(p:patient)-[:HAS_RESULT]->(apr:apachepatientresult)
where apr.apacheversion = 'IVa' and apr.actualiculos >= 1
with p.uniquepid as uniquepid, min(p.patienthealthsystemstayid) as patienthealthsystemstayid
call {
with uniquepid, patienthealthsystemstayid
match(p:patient)
where p.uniquepid = uniquepid and p.patienthealthsystemstayid = patienthealthsystemstayid
return min(p.unitvisitnumber) as unitvisitnumber
}
match(p:patient)
where p.uniquepid = uniquepid
and p.patienthealthsystemstayid = patienthealthsystemstayid
and p.unitvisitnumber = unitvisitnumber
return p
07-24-2022 07:56 AM
Thank you so much. the first query you posted works fine. And after creating an index of the node attributes, it is also much faster.
All the sessions of the conference are now available online