Neo4j

kasthuri · ‎03-13-2020

I have a Neo4j network in which each edge/relationship represents a bunch of people. I have the list of such people as one of the properties of the edges. Now, each of those people have further attributes like the treatment they took and what disease they have, etc. How do we best represent this? One idea is to create an in-between node for each edge and store these attributes as that node's property. Is there any other better way than creating an in-between node? Thanks.

andrew_bowman · ‎03-13-2020

Okay, so we're going to need to create :Patient nodes. You will need an index (or unique constraint, if that's appropriate) on something like :Patient(id) before you do this. This is assuming that the same patient may be involved in multiple incidents and not just one, so this will support finding patients quick instead of having to scan all patient nodes.

You haven't told us the relationship type, so for this we'll just use :LED_TO for the type. Let's use :Incident for the in-between node.

The basic query would be something like:

MATCH (start:Event)-[rel:LED_TO]->(end:Event)
CREATE (start)-[:LED_TO]->(inc:Incident)-[:LED_TO]->(end)
FOREACH (patientId IN rel.patients | 
   MERGE (p:Patient {id:patientId})
   CREATE (inc)-[:HAS_PATIENT]->(p))
DELETE rel

However if there are many such matches in your graph, you should batch this so that you don't go out of heap or have GC issues attempting to hold all pending changes before the atomic commit.

You can use apoc.periodic.iterate() from APOC Procedures to do this.

Something like:

CALL apoc.periodic.iterate("MATCH (start:Event)-[rel:LED_TO]->(end:Event) RETURN start, rel, end",
"CREATE (start)-[:LED_TO]->(inc:Incident)-[:LED_TO]->(end)
FOREACH (patientId IN rel.patients | 
   MERGE (p:Patient {id:patientId})
   CREATE (inc)-[:HAS_PATIENT]->(p))
DELETE rel", {}) YIELD batches, total, errorMessages
RETURN batches, total, errorMessages

View solution in original post

andrew_bowman · ‎03-13-2020

In this case an in-between node makes sense, as what you encountered is a sign that you're missing an important entity in your graph, like maybe a :Conference node or a :Meeting node or an :Appointment node, and you can connect the :Person nodes (that have properties for that person, and maybe have connections to further nodes, like :Disease or :Treatment) to that new node.

So think about what an instance of this new node would symbolize in the real world, and consider if that's something that is important enough to capture as a node in your graph.

kasthuri · ‎03-13-2020

Thanks, andrew.bowman. Actually all the nodes represent molecular events and the edges are patients associated with those events. This new in-between node would technically capture clinical events. But, certainly yes, this new node will branch into further nodes that characterizes each patient in terms of clinical information. So it could be a short tree hanging from each edge.

kasthuri · ‎03-13-2020

Any chance you could suggest how a sample Cypher query for creating in-between node for an edge property list with patients would look like?

For example, lets say the edge has property,
patients: p1,p2

I need to create an in-between node and two children node for p1 and p2. And I need to do this for all edges.

Thanks!

andrew_bowman · ‎03-13-2020

Some preliminaries here, are the entries in the patient list unique identifiers of some kind for a patient? Do you have :Patient nodes (or something similar enough) to use for this, or will these be new? And if you have existing nodes, do you have an index on them for quickly looking them up by whatever property is in the patient list?

kasthuri · ‎03-13-2020

Yes, they are unique identifiers meaning they are different patients. When I created them, I made sure they are unique identifiers. I don't have :Patient nodes. Ideally, I would want to create it as an in-between node and have these patient identifiers as Labels of children nodes. Later I would want to add other properties for the patients by matching. Something like this, but it could be more than 3 or less. And I need this for all edges since the edges are built if there are common patients.

andrew_bowman · ‎03-13-2020