Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-25-2019 09:03 AM
Hi everyone,
I new to neo4j. I’m doing an initial investigation to understand whether neo4j will be suitable for my problem, and if so, how technically.
I have events that have happened to patients, the order in which they occur is relevant. I have 10’s of thousands of patients and each patient can have 10’s of events. I need to group these patients by their events and the order they occur. This is to understand what the most common path of events. The diagram below hopefully elaborates. I will also want filter out patients by say their gender, etc but I think this is fairly straight forward. Also, is this a fairly trivial problem for neo4j in terms of performance?
I’ve had a play around with the movie graph example and I think I need to use visual graph grouping. What I’ve struggled to find examples of, is a way to group while maintaining the graph hierarchy.
Thanks,
Ali
11-25-2019 02:01 PM
Grouping
CREATE (David:Person {name:'David'})
CREATE (Ali:Person {name:'Ali'})
CREATE (Danny:Person {name:'Danny'})
CREATE (Amin:Person {name:'Amin'})
CREATE (A1_1:Event_01 {name:'A'})
CREATE (A1_2:Event_01 {name:'A'})
CREATE (B1:Event_01 {name:'B'})
CREATE (C1:Event_01 {name:'C'})
CREATE (D2_1:Event_02 {name:'D'})
CREATE (D2_2:Event_02 {name:'D'})
CREATE (A2:Event_02 {name:'A'})
CREATE (E3:Event_03 {name:'E'})
CREATE (C3:Event_03 {name:'C'})
CREATE
(David)-[:NEXT]->(A1_1), (A1_1)-[:NEXT]->(D2_1), (D2_1)-[:NEXT]->(E3),
(Ali)-[:NEXT]->(A1_2), (A1_2)-[:NEXT]->(D2_2), (D2_2)-[:NEXT]->(C3),
(Danny)-[:NEXT]->(B1), (B1)-[:NEXT]->(A2),
(Amin)-[:NEXT]->(C1)
;
call apoc.nodes.group(['Event_01','Event_02', 'Event_03'],['name']);
11-25-2019 05:05 PM
It depends, I think we need you to say more about how you're planning on using this.
Note that the events don't have to be linked to one another. If they have dates on them, you can date order and still get them in the right order, even if all of the events are linked to the same patient.
One thing that's not clear to me about the question is if the events cluster in any way. Imagine a patient who comes to get diabetes treated (this could create a "thread" of many events/encounters) but who separately is being treated for a comorbidity like heart disease (this could create an only marginally related "thread" of other events/encounters). Do you have one thread per patient, or many? You might consider creating a node per thread, and then link all of the events to the "Thread node". Then link all of the threads to the patient, meaning you have a hierarchy, while retaining order of the individual event sequences.
11-26-2019 06:12 AM
Hi David,
Thanks for replying.
I think we need you to say more about how you're planning on using this.
The plan is to use something like d3.js to create dynamic visualisations to show the most common path taken by the patients, possibly a graph and sankey diagram combined. The user may then choose to filter e.g. by gender and then the visualisation updates accordingly.
Note that the events don't have to be linked to one another. If they have dates on them, you can date order and still get them in the right order, even if all of the events are linked to the same patient.
Would be good to know how to do this, would save me some upfront data wrangling.
Do you have one thread per patient, or many?
At the moment, it's one thread per patient.
Thanks,
Ali
11-26-2019 07:19 AM
I'm making this up, but this is what I mean by a "Thread".
CREATE (p:Patient { name: "Bob" })
CREATE (t:Thread { name: "Diabetes" })
CREATE (e1:Event { name: "Do a Thing", date: date("2019-09-30") })
CREATE (e2:Event { name: "Do another thing", date: date("2019-10-01") })
CREATE (p)-[:THREAD]->(t)
CREATE (t)-[:EVENT]->(e1)
CREATE (t)-[:EVENT]->(e2)
Bob has a diabetes thread where he did 2 things. Each has a date. By doing:
MATCH (t:Thread)-[:EVENT]->(e:Event)
RETURN e
ORDER BY e.date
You'll never lose ordering on the events. But notice that the events aren't connected to one another. They're grouped by a "Thread" object. To temporally order things, all you need is a "date" field, you don't need relationships between events.
11-26-2019 10:36 AM
David,
I've rewritten my example with dates, as you've described (I haven't added Thread). How do I then do the grouping to get the result I want.
CREATE (David:Person {name:'David'})
CREATE (Ali:Person {name:'Ali'})
CREATE (Danny:Person {name:'Danny'})
CREATE (Amin:Person {name:'Amin'})
CREATE (DavidA:Event {name:'A', date: date("2019-09-01")})
CREATE (DavidD:Event {name:'D', date: date("2019-09-02")})
CREATE (DavidE:Event {name:'E', date: date("2019-09-03")})
CREATE (AliA:Event {name:'A', date: date("2019-09-04")})
CREATE (AliD:Event {name:'D', date: date("2019-09-05")})
CREATE (AliC:Event {name:'C', date: date("2019-09-06")})
CREATE (DannyB:Event {name:'B', date: date("2019-09-01")})
CREATE (DannyA:Event {name:'A', date: date("2019-09-02")})
CREATE (AminC:Event {name:'C', date: date("2019-09-01")})
CREATE
(David)-[:NEXT]->(DavidA), (David)-[:NEXT]->(DavidD), (David)-[:NEXT]->(DavidE),
(Ali)-[:NEXT]->(AliA), (Ali)-[:NEXT]->(AliD), (Ali)-[:NEXT]->(AliC),
(Danny)-[:NEXT]->(DannyB), (B1)-[:NEXT]->(DannyA),
(Amin)-[:NEXT]->(AminC)
;
Thanks,
Ali
11-26-2019 09:14 AM
I would model this in Neo4j as you have illustrated above.
Then to count all sequence occurences (including subsequences) you can expand all sequences with a recursive path expansion Cypher query:
match p=(firstevent:Event)-[:NEXT*]->(:Event)
where not((firstevent)<-[:NEXT]-(:Event)
return p as sequence, count(1) as occurences
Finally, you can create a suffix tree from the results of this of this query
You should also be able to easily create a sankey diagram from the results of above query
11-26-2019 10:37 AM
Hi Niclas,
Are you referring to my solution or David's solution?
Thanks,
Ali
11-26-2019 10:42 AM
Hi,
Sorry for the confusion. I was referring to your original post where you illustrate the sequences
/Niclas
11-26-2019 01:02 PM
Thanks for clarifying Niclas.
It's doesn't give precisely what I want. There are two A--->D subgraphs and node C is missing.
Also, there was a missing bracket:
match p=(firstevent:Event)-[:NEXT*]->(:Event)
where not((firstevent)<-[:NEXT]-(:Event))
return p as sequence, count(1) as occurences
What I need is as follows, there is a count property in nodes A--->D, with a value of 2 in each.
Ali
All the sessions of the conference are now available online