Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-19-2018 08:48 AM
I have a simple model in which a Patient has many events. Each event has a start date and a category like 'medication'.
It is normal for a patient to have 10s of events of each category.
I would like to find patients where:
a measurement event happened after an admission
and an operation event happened after the same admission event
and the admission event itself happened after a medication event
I've modelled the data in neo4j like:
(:Patient {id: "ABCD"})-[:HAS]->(:Event {startDate: '2018-01-01', category: 'medication'})
Which produces a graph like:
The query I've come up with is:
MATCH p1 = (measurement:Event {category: 'measurement'})<-[:HAS]-(patient:Patient)
MATCH p2 = (admission:Event {category: 'admission'})<-[:HAS]-(patient)
MATCH p3 = (operation:Event {category: 'operation'})<-[:HAS]-(patient)
MATCH p4 = (medication:Event {category: 'medication'})<-[:HAS]-(patient)
WHERE date(measurement.startDate) > date(admission.startDate)
AND date(operation.startDate) > date(admission.startDate)
AND date(admission.startDate) > date(medication.startDate)
RETURN collect(distinct patient.pseudoId);
However this query times out with my test dataset of 1000 patients on my machine. The Explain produces:
Is there more efficient way to model or query the data in neo4j?
I've uploaded my test data here.
12-19-2018 08:59 AM
Some ideas to improve the data model:
category
property into a second label for :Event
nodes, so you have e.g. :Event:Measurement
nodes.[:AFTER]
relationships between Events
of the same person. So you have a timeline of that patient explicitly and don't need to compare dates - that approach is what I call "graph thinking" in contrast to "relational thinking" (comparing dates)12-20-2018 01:41 AM
Thanks for the ideas @stefan.armbruster.
Can you point to any examples of a model that has :AFTER
? This approach sounds promising but we also have queries that want to know events that happened before or on the same day/month/year as another event. So I'm concerned that it will be complicated to create the graph and the cost will also be high.
12-20-2018 01:50 AM
On important factor for a decision is IMO if you need to do cross-patient comparisions on dates.
If you only do this in a non-cross-patient way I'd maintain for each patient a chain of events. If "same day" is an important aspect of your domain, you can overlay the rather generic AFTER
relationship with SAME_DAY
relationships.
12-20-2018 01:54 AM
Yeah we only need to do this in a non-cross-patient way.
If you only do this in a non-cross-patient way I'd maintain for each patient a chain of events. If "same day" is an important aspect of your domain, you can overlay the rather generic
AFTER
relationship withSAME_DAY
relationships.
Ok, I'll give it a shot.
12-20-2018 04:13 AM
You can also drive the query with a USING SCAN ON patient
so it doesn't do the index lookups but instead filters.
12-20-2018 05:21 AM
Hmm I played around a bit with it, my original ideas (turning the categories into labels or rel-types) didn't help so much due to the 4x expand that the planner does.
It's much better to do the predicate on a collection per patient.
profile
MATCH (patient:Patient)
WITH patient, [(patient)-[:HAS]->(e:Event) WHERE e.category IN ['measurement','admission','operation','medication'] | e] as events
WITH patient, apoc.map.groupByMulti(events,'category') as grouped
WHERE all(pair IN [['admission','measurement'],['admission','operation'],['medication','admission']] WHERE ANY(first IN grouped[pair[0]] WHERE ANY(second IN grouped[pair[1]] WHERE first.startDate < second.startDate)))
RETURN size(collect(distinct patient.pseudoId));
// Started streaming 1 records after 85 ms and completed after 85 ms.
// 959
TODO does it have to be the same admission or not?
If yes then the predicate has to be adapted a bit.
profile
MATCH (patient:Patient)
WITH patient, [(patient)-[:HAS]->(e:Event) WHERE e.category IN ['measurement','admission','operation','medication'] | e] as events
WITH patient, apoc.map.groupByMulti(events,'category') as grouped
WHERE any(admission IN grouped['admission'] WHERE
ANY(medication IN grouped['medication'] WHERE admission.startDate > medication.startDate) AND
ANY(operation IN grouped['operation'] WHERE admission.startDate < operation.startDate) AND
ANY(measurement IN grouped['measurement'] WHERE admission.startDate < measurement.startDate)
)
RETURN size(collect(distinct patient.pseudoId));
// Started streaming 1 records after 127 ms and completed after 127 ms.
// 875
12-20-2018 07:50 AM
Thanks @michael.hunger, the query you suggested is much faster and seems to scale as more filters are added. I'll continue to investigate it.
All the sessions of the conference are now available online