cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to model many:many relationships

Please forgive the newbie question, but I just cannot see how to model this in Neo4j, but there must be a simple answer. Say I have 2 items, 2 locations, and 2 suppliers. Each supplier can supply each item to each location, but their unit price and lead time is different.
Item Location Supplier Lead Time Unit Price
1 1 1 1 1
1 1 2 2 10
1 2 1 3 2
1 2 2 4 20
2 1 1 5 3
2 1 2 6 30
2 2 1 7 4
2 2 2 8 40

I get that I can create the relationships along the lines of
(s:Supplier)-[sr:SUPPLIES]->(i:Item)-[tr:TO]->(l:Location)
but where do I store the lead time and unit cost information, on sr or tr?

If I put these values on sr how do i distinguish between the same supplier supplying different locations? If I put it on tr how do I distinguish between different suppliers suppling to the same location?

Or is this the magic of Neo4j?

17 REPLIES 17

Your modeling isn't sufficient to capture what you want.

While it is true that a supplier supplies an item to a location, you've missed out on some additional concept, call it a :ItemShipment maybe, an instance of a shipment from a supplier to a location of an item (or items, if you like). This should probably replace :Item in your model, but have a relationship to the :Item node.

(:Supplier)-[:SUPPLIES]->(:ItemShipment)-[:TO]->(:Location),
(:ItemShipment)-[:INCLUDES]->(:Item)

This is one possible way to model this. You could then have the info you need on the :ItemShipment node (or possibly on the :INCLUDES relationship (or whatever you want in place of this) from the shipment to the :item in question).

Notice that this now gives context to your :SUPPLIES and :TO relationships. Provided that only a single :Supplier can produce an :ItemShipment, and that an :ItemShipment may only go to one :Location, an :ItemShipment must therefore only have a single :SUPPLIES relationship and a single :TO relationship. You no longer had the disconnection of context that you had when there were multiple :SUPPLIES and :TO relationships per :Item node (since with the old model there was no way to tell which :TO relationship corresponded with the right :SUPPLIES relationship (and :Supplier node), and vice versa).

Thanks Andrew. But I'm trying to model the generic case so that I can then determine the shortest/cheapest supply. And of course the model is a lot more complex where a supplier could be supplying around 500 parts to over 20 locations, each with its own lead time, and I could have as many as 3 suppliers supplying the same part to a particular location.

So I'm not trying to model what DID the supplier supply, but what COULD the supplier supply. Or, more correctly, what is the cheapest route that will get me supply of several items to a particular location with most of the items arriving on time.

I believe you have given me the correct construct. I can use :StdTerms instead of :Shipment, and a :Shipment is constructed dynamically/temporally and consists of multiple :StdTerms.

It gets more complex though, because I still need to consider whether the supplier has sufficient inventory to satisfy my order. But I'll try to model this on my own before asking more questions.

Andrew, the more I think about it, the less I understand the use of something like :StdTerms. How would I determine which suppliers can deliver an item to a location?
I am 100% certain that the Cypher code below contains at least 1 error, very likely several more.

MATCH (l:Location)<-(rt:TO)<-(t:StdTerms)<-(rc:CONTAINS)<-(i:Item) 
WHERE i.key = ItemID AND l.key = LocationID
WITH rt
MATCH (s:Supplier)->(rt)
RETURN s, t

In this one I think you're confusing relationships and nodes. :TO and :CONTAINS would be relationships connecting the adjacent nodes:

(l:Location)<-[:TO]-(t:StdTerms)-[:CONTAINS]->(i:Item) 

, which means :Supplier can't have a relationship to rt (you can't have relationships to relationships).

If we're still using a model similar to the one I proposed, then you could have (s:Supplier)-[:SUPPLIES]->(t) (and you'd want to change your WITH clause to be WITH t so that it stays in scope).

Sorry Andrew, I used the incorrect syntax. As per your suggestion, the relationships would be

(:Supplier)-[:SUPPLIES]->(:StdTerms)-[:TO]->(:Location)
(:StdTerms-[:CONTAINS]->(:Item)

My syntax should have been

MATCH (l:Location)<-[rt:TO]<-(t:StdTerms)-[rc:CONTAINS]->(i:Item) 
WHERE i.key = ItemID AND l.key = LocationID
WITH rt
MATCH (s:Supplier)->[:rt]->()
RETURN s, t

Is this really the most efficient way? I can't help thinking that running the shortestPath algo to determine either the quickest or cheapest source of an item would be expensive.

ri8ika
Graph Voyager

I would define relationship properties for this scenario.

(s:Supplier)-[sr:SUPPLIES {leadtime: timestamp()}]->
(i:Item {unitprice: 200})-[tr:TO]->(l:Location)

That likely won't work out. Or at the least, it would mean that :Item nodes are not distinct per item, but instances of an item at a unitprice (so the same item might show up in another :Item node at a different unitprice).

Also you'll have to consider whether this is also an instance per location/supplier, meaning an :Item node would only have a single incoming :SUPPLIES relationship and a single outgoing :TO relationship. I'd think you'd have to model it this way, to keep the context of which supplier is supplying the item to where. (You would lose that context if multiple :SUPPLIES and/or :TO relationships were allowed on the :Item).

And with that modeling, it becomes nearly identical to my suggestion of using an :ItemShipment node (or a :StdTerms node), with perhaps a difference that the item information can go on this node rather than on a connected :Item node. And yes that could work. Just you would no longer have a distinct :Item node per item, because each would be a instance per terms and shipment (supplier/location).

Yes, but what happens when the lead time for the same item to a different location is different? The unit price may change between locations too, because the location may be a different customer.

So if you are telling me that Neo4j keeps unique connections between Supplier, Item, and Location automatically then this is what I am looking for. My understanding is that if I run

MERGE (s:Supplier {key: 'sA'}) - [sr:SUPPLIES {leadtime: 1}]->(i:Item {key: 'iA', unitcost: 20}) - [tr: TO] -> (l:Location {key: 'lA'})
MERGE (s:Supplier {key: 'sA'}) - [sr:SUPPLIES {leadtime: 10}]->(i:Item {key: 'iA', unitcost: 25}) - [tr: TO] -> (l:Location {key: 'lB'})
MERGE (s:Supplier {key: 'sB'}) - [sr:SUPPLIES {leadtime: 2}]->(i:Item {key: 'iA', unitcost: 30}) - [tr: TO] -> (l:Location {key: 'lA'})
MERGE (s:Supplier {key: 'sB'}) - [sr:SUPPLIES {leadtime: 20}]->(i:Item {key: 'iA', unitcost: 35}) - [tr: TO] -> (l:Location {key: 'lB'})

When I test this code it generates 4 :Supplier nodes and 4 :Item nodes. There are only 2 of each.

Don't MERGE the entire pattern. MERGE is like a MATCH, and if the entire pattern isn't found, then it does a CREATE of the entire pattern. That's why you're seeing duplicates.

Instead, MERGE the :Supplier and :Location nodes separately, then MERGE the pattern between them (since we want to make sure we don't reuse :Item nodes with the same key and unitcost that are already connected to different :Supplier and :Location nodes (or that have those same connecting nodes, but different leadtime).

MERGE (sa:Supplier {key: 'sA'})
MERGE (sb:Supplier {key: 'sB'})
MERGE (la:Location {key: 'lA'})
MERGE (lb:Location {key: 'lB'})
MERGE (sa) - [sr:SUPPLIES {leadtime: 1}]->(i:Item {key: 'iA', unitcost: 20}) - [tr: TO] -> (la)
MERGE (sa) - [sr:SUPPLIES {leadtime: 10}]->(i:Item {key: 'iA', unitcost: 25}) - [tr: TO] -> (lb)
MERGE (sb) - [sr:SUPPLIES {leadtime: 2}]->(i:Item {key: 'iA', unitcost: 30}) - [tr: TO] -> (la)
MERGE (sb) - [sr:SUPPLIES {leadtime: 20}]->(i:Item {key: 'iA', unitcost: 35}) - [tr: TO] -> (lb)

And if you're creating these for the first time, go ahead and use CREATE (at least for the relationship parts).

Also make sure you have indexes up on :Supplier(key) and :Location(key).

Thanks ANdrew. Guess I will need to do the same for :Item - CREATE and INDEX - before creating the edges. In your code above I guess there will be 4 :Item nodes?

I will try

CREATE (ia:Item {key: 'iA'})
CREATE (ib:Item {key: 'iB'})

CREATE (sa) - [sr:SUPPLIES {unitcost: 20}] -> (ia) - [tr: TO{leadtime: 1}] -> (la)

Hopefully I will get unique edges.

I'm not sure this is what you want.

You need to decide if :Item nodes should be instances (where you could have multiple :Item nodes with the same key) or unique per key.

The idea with the modeling suggestions I gave is that you have instance nodes (either :ItemShipment or :stdTerms or whatever) such that there is every only one incoming :SUPPLIES relationship and one outgoing :TO relationship. This is important because you retain context between the supplier and location, as well as the unitcost and leadtime. You know that the :TO relationship, for example, can only be associated with the single :SUPPLIES relationship (and unitcost) for that item instance, and likewise ot the single linked supplier.

Basically you don't want to use the same :Item nodes to have multiple relationships to multiple suppliers and locations.

Andrew, yes, you are correct. I need unique keys for :Item. The code above produces two edges between :Supplier and :Item for each combination, which is incorrect.

The issue is with :StdTerms is that I need a unique :StdTerms for each :Item, but even then I lose the :Location. Wouldn't it be better to model it as follows:

(:Supplier)-[:SUPPLIES]->(:StdTerms)-[:TO]->(:Location)
(:StdTerms)-[:FOR]->(:Item)->[DELIVERED_TO {stdLT: 10, stdCost: 100}]->(:Location)

Alternatively

(:Supplier)-[:MAKES]->(:Item)
(:Supplier)-[:SUPPLIES {stdLT: 10}]->(:Location)
(:Location)-[:BUYS {stdCost: 100}]->(:Item)

The issue in this case is that there can only be one lead time between a :Supplier and a :Location.

I'm not sure if that model works either, but then I'm still trying to grasp the requirements here.

If lead time is constant between each :Supplier and :Location they supply to (regardless of items) then it makes sense to have (:Supplier)-[:SUPPLIES {stdLT: 10}]->(:Location) in the graph, one relationship per :Supplier / :Location combination where such shipping exists.

This proposal though doesn't seem correct:

(:StdTerms)-[:FOR]->(:Item)->[:DELIVERED_TO {stdLT: 10, stdCost: 100}]->(:Location)

Since you want :Item nodes to be unique in your graph for an item, you have lost the association of which :Supplier is associated with the stdLT and stdCosts on the :DELIVERED_TO relationship (assuming there are multiple :Suppliers with :StdTerms for the same item). And likewise, if some :Suppliers ship an item to some :Locations but not others, you've lost association there for where :Suppliers can deliver the item (and again, you have no idea which of the costs and lead times are associated with which suppliers).

As for (:Supplier)-[:MAKES]->(:Item), that can work, but I'm not sure how well that fits your querying needs.

As for (:Location)-[:BUYS {stdCost: 100}]->(:Item), which supplier did they buy from? What was the lead time for that shipment? This modeling alone isn't capturing that association, which I think is why you need some kind of instance node (:StdTerms? :ItemShipment?) between a :Supplier and a :Location, and also for an :Item, so that you never lose the association between the supplier, the location supplied to, the item, the lead time, and the cost.

Think of that as the "quote", what the lead time and cost will be for a specific supplier to ship a specific item to a specific location.

If you need to additionally model actual transactions based on this, that would require extra modeling. Perhaps a :Purchase node, with relationships to at the least the :StdTerms, so via traversal you can identify the terms, the item, the supplier, and the location, with the :Purchase node holding additional info on the transaction (how many were bought, if the payment went through, if/when the purchase was fulfilled, etc). And if you want direct relationships from the :Purchase node to the relevant nodes in question (to directly connect the buyer and seller and maybe the item to the :Purchase) there's no reason you can't do that as well, to simplify your queries regarding purchases.

Andrew, I agree. I have been struggling to get the concept of :StdTerms to fit my needs.

Let me try to abstract the problem and describe it in RDBMS terms.

  • I have three tables, each with unique indices which we can call idxA, idxB, and IdxC
  • I have a fourth table that contains values that are unique to each unique combination of idxA, idxB, and idxC that exists
  • Not all combinations of idxA, idxB, and idxC exist
  • We can deduce relationships between tables A, B, and C through the indices.

I see plenty of Neo4j examples of a table with 2 indices, but I haven't seen any of 3 or more.

This is what I am trying to model.

Well there are two ways you can go about this, and this isn't an either/or, you could do both if it fits the needs of different kinds of queries.

The model I proposed is one way: a :StdTerms node that has relationships to both :Supplier, :Location, and :Item:

(:Supplier)-[:SUPPLIES]->(:StdTerms)-[:TO]->(:Location),
(:StdTerms)-[:FOR]->(:Item)

An example query, get the 5 cheapest suppliers of steel to Greenland, and order them by lead time ascending, listing supplier name, stdCost, and stdLT:

MATCH (:Item {name:'Steel'})<-[:FOR]-(st:StdTerms)-[:TO]->(:Location {name:'Greenland'})
// USING JOIN ON st      // you could see if using a join hint is better or worse for performance...this expands from both item and location, using index lookups for both if such indexes exist
WITH st
ORDER BY st.stdCost DESC
LIMIT 5
MATCH (supp:Supplier)-[:SUPPLIES]->(st)
RETURN supp.name as supplier, st.stdCost as cost, st.stdLT as leadTime
ORDER BY leadTime ASC

The :StdTerms node would be like the fourth table you described, with values unique to each unique combination of three foreign keys. In a property graph, relationships are used in place of foreign keys as connections between entities. As long as we never have more than one of each of those 3 relationship types to the nodes in question, this anchors a :StdTerms node to 1 specific item, 1 specific location, and 1 specific supplier.

And if you do want to use an index lookup as opposed to graph traversal and filtering, there's no reason you can't add all 3 properties redundantly onto the node itself and add composite indexes on all 3 properties, and perhaps on each of the 3 pairs as well (item and location, item and supplier, supplier and location).

I suppose there are some other questions though whose answers might require changes in the model.

Does the cost for a specific item from a specific supplier remain constant no matter the location, or does it change per location? If it's constant, then a :StdTerms node may indeed have multiple :TO relationships (with the relationship to a location having the stdLT).

Thanks Andrew. This is very helpful.

The purchase cost changes per location. At the very least the transportation cost will be different, even if the purchase cost is the same. The price (purchase cost) paid by the buyer (:Location) is likely to be different by :Location.

What would the CREATE statement look like for :StdTerms? In the statement below I'm assuming that :Item, :Location, and :Supplier nodes have been created already.

CREATE (i}) <- [:FOR] - (:StdTerms {Supplier: s, stdCost: 10, stdLT: 100}) [:TO] -> (l]

The reality is that all of this has date effectivity, meaning that there are price changes - both purchase and transportation - and suppliers come and go, so every :StdTerms has to have a startDate and an endDate. But I can see how to add these fields and modify the query accordingly.

With effective dates involved, there are a couple approaches you can use.

You could create a Node Key Constraint, which is a constraint that a specific set of properties on nodes of this type must be present, and must uniquely identify a node. It comes with a composite index for quick index lookup for equality on all properties of the node key. In this case, if we assume you're using id properties for unique identification of various nodes, then a node key on supplierId, itemId, and locationId isn't enough, due to the effective dates. If you know that terms (for the same item/supplier/location combination) will not change over the course of a single day, then only startDate needs to be included in addition to the other ids to become the node key.

The creation query might look like this:

// assume supplierId, locationId, and itemId are unique for nodes of their label, and are given as parameters along with startDate, endDate, and cost
MERGE (st:StdTerms {supplierId:$supplierId, itemId:$itemId, locationId:$locationId, startDate:$startDate})
ON CREATE SET st.endDate = $endDate, st.cost = $cost, st.stdLT = $stdLT
MATCH (loc:Location {locationId:$locationId}), (item:Item {itemId:$itemId}), (supp:Supplier {supplierId:$supplierId})
MERGE (st)-[:FOR]->(item)
MERGE (st)-[:TO]->(loc)
MERGE (st)<-[:SUPPLIES]-(supp)

As :StdTerms nodes go out of the effective date, you might consider an update operation that can make changes to the node and relationships if necessary. For example, replacing the :StdTerms label with :ExpiredTerms, and possibly replacing relationships accordingly (:ARCHIVED_FOR, :ARCHIVED_TO, etc). This has the consequence of increased upkeep complexity (say a cron job run nightly), but allows you to ensure your realtime queries for active standard terms and traversal of active relationships doesn't get bogged down with historical/archived data.