cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to model a product hierarchy relationship?

lingvisa
Graph Fellow

For example:

Product -> Car -> ElectricCar -> Telas Model S
Product -> Phone -> SmartPhone -> iPhone 8

Should Product->Car->ElectricCar be modelled as a IS_A relationship, as above, or these 3 levels of product categories all become 'Telas Model S' 's labels without a hierarch, or both?

9 REPLIES 9

I think you'll find as with many things, the answer is going to be "it depends". There's many ways to model and either of what you've suggested isn't wrong. The best advice I can give is identify how you're going to be querying the data. That'll give you direction on how to model your graph.

I agree with @mike.r.black - there are a near infinite ways you could model your data. The IS_A approach isn't bad at all, it sounds OK. But the key insight is that data models exist to facilitate answering questions you have about your data. Those are your queries.

So you should start with some notion of what questions you want to answer, and that will give you great insight into how to structure the model to make that easy.

Going the other way around (model first, queries later) usually results in situations where when you write your queries, it's awkward or difficult. Because the model wasn't made for the queries.

@david.allen, one question for the IS_A approach is that, for some nodes (label), I won't have any instances to associate with:

Product -> Car -> ElectricCar -> Telas Model S

The 'Product' and "Car" won't have any instances, because they are not specific enough. All actual cars will be instances of subtype, i.e. ElectricCar. Is this allowed in Neo4j? I remember somewhere I read that each node must refer to an instance. The IS_A hiearchy is a sort of ontology in the traditional sense.

This is easily solved by creating instances of your concepts. Use a node label like :Category and then store "Product" as an individual category, and "Car" as an individual category. In this sense, they are singleton nodes and act like reference metadata you store, so that you can associate things to them.

lingvisa
Graph Fellow

In this case, in order to express the hiearchy relationship, Category has to have a IS_A to itself, i.e. ElectricCar is Car, Car is Product?

Category IS_A Category, which links to itself.

No - in the example I'm providing, Category isn't a node, it's a Label.

I.e. CREATE (car:Category { name: "Car" })<-[:IS_A]-(electric:Category { name: "Electric Car" })

It makes sense and thanks.

One more comment: in such subClassOf hierarchies, an instance will only be associated with the bottom class, i.e. Tesla Model S IS_A Electic_Car, although it is a Car as well, and a Product as well, where Electric_Car is the bottom class. No any instances will be directly associated with 'Car' and Product, and any potential intermediate nodes in the hierarchy. How does this solution compare with the solution of directly putting Product, Car and Electric_Car as 3 labels on 'Tesla Model S', without creating the hierarchy. What are pros and cons of these 2 approaches? From a search perspective, it seems multiple labels will be more efficient, while the Is_A relationship needs inference to answer questions like "Is Tesla Model S a car?" The benefit of the hierarchy approach is that it can organize all cars under the "Car" node, creating a more connected graph visually.

No any instances will be directly associated with 'Car' and Product, and any potential intermediate nodes in the hierarchy. How does this solution compare with the solution of directly putting Product, Car and Electric_Car as 3 labels on 'Tesla Model S', without creating the hierarchy. What are pros and cons of these 2 approaches?

The pros and cons of these two approaches are dependent on your use case and what you're trying to do with this data, and no one else can answer this for you. More analysis on what you're trying to accomplish is needed to make this answerable.

From a search perspective, it seems multiple labels will be more efficient, while the Is_A relationship needs inference to answer questions like "Is Tesla Model S a car?"

I wouldn't worry about efficiency first. Worry about simplicity of query, and fidelity to your overall data model first. Worry about query efficiency only later when you have a specific query in mind. But for the record, if you index the "name" attribute, and you look up a node by both its label and name, that's going to be a very efficient lookup, and adding extra labels isn't really going to help you much.

The benefit of the hierarchy approach is that it can organize all cars under the "Car" node, creating a more connected graph visually.

Maybe. But this goes to the details of your use case. How connected the graph is visually implies that what you want to do with this graph is visualize it primarily, and that may not be the case.

The better way to go about this is to start with your "money queries". Why are you building this system in the first place? Usually, it boils down to 5-7 at most "money queries", the things that you want your database to be able to answer that drive most of the value. While visualizing the graph is quite nice, usually that's not the primary purpose. Focus on those money queries first will crystalize a lot about how best to do the data model and arrange things for easy query. Not knowing what questions you want to ask of the graph, it's near impossible for anyone to give good modeling advice, because data models exist to facilitate answers to queries.