Neo4j

awu · ‎09-20-2018

Hi - New to Graph and would like to learn more about modeling and design.

How would you best model an employee to company relationship, where you have a Company entity and a Person entity?

Would it be better to have

MATCH (n:Person)-[r:EMPLOYEE]->(m:Company) WHERE r.occupation = 'Janitor' RETURN n, r, m
or
MATCH (n:Person)-[r:JANITOR]->(m:Company) RETURN n, r, m

Is there a threshold for which there are too many relationship types between two nodes? Or is the database better optimized for relationships versus properties on relationships?

Thanks in advance for your help.

stefan_armbrust · ‎09-21-2018

In most cases having more specific relationship types is preferrable over using generic ones. However it's (in most cases) an antipattern to encode instance identifiers into a relationship type.

The reason for this is performance. In your example you need to iterate over all relationships and load the properties for each. This means 2 IO accesses for each. If you can be selective on relationship type instead of property, you only have one IO access.
On dense nodes it's even more of a difference since Neo4j maintains separate relationship chains for each relationship type.

The standard store format of neo4j allows for 65k different relationship types.

awu · ‎09-21-2018

Thanks @stefan.armbruster for the quick response!

Sounds like a classic situation where I'd give up readability and design to gain some performance improvements. It makes sense from the IO access perspective. Whether it is a better practice to proliferate with multiple relationship types versus one relationship with multiple properties is still a bit murky, but I'll try out both.

This discussion brought up another idea though, whether having multiple Entity types would be beneficial. To wit,

MATCH (n:Person)-[r:JANITOR]->(m:Company) RETURN n,r,m
or
MATCH (n:Janitor)-[r:EMPLOYEE]->(m:Company) RETURN n,r,m

and I exclude
3) MATCH(n:Person)-[r:EMPLOYEE]->(m:Company) where n.occupation = 'Janitor' RETURN n,r,m for similar reasons as above.

How do most people design their graph databases when trading off against performance? Are the delays negligible initially so it's really a matter of developer's preference? How will they fare at scale?

Thanks again.

stefan_armbrust · ‎09-21-2018

Classic consulting answer "it depends".
If you consider janitor being a subclass of person you might assign two labels to that node (p:Person:Janitor).
I assume in your case janitor is only a valid concept in the context of a company, so I'd go with alternative 1). But - as said - it depends on the domain and your understanding of it.

mike_r_black · ‎09-24-2018

Another thing to also consider is what I call "Lazy Conversations". Take the email data model example that has been used many times as a graph example. We know we don't do: (user)-[emails]->(user) but that's actually a pitfall of lazy speech. We know it's a much more extensible model to do: (user)-[sends]->(email)-[to]->(user).

In your example, would occupation actually be another node: (user)-[has]->(occupation)-[employed at])->(company)? I would imagine a person could have more than one occupation/job role at a company or at multiple companies concurrently. Then it's just a matter of writing cypher optimized for the traversal to match the pattern of data you're looking for and you'll get the performance you expect from a graph db.

awu · ‎09-24-2018

@mike.r.black - This is great. Thank you.

It seems as if there's another possibility of adding a new node.

Is
MATCH (o:occupation {type:"Janitor"})<-[:IS]-(p:Person)-[:EMPLOYEE_OF]->(m:Company)
any better than
MATCH(n:Person)-[r:EMPLOYEE]->(m:Company) where n.occupation = 'Janitor' ?

I do like how this allows for multiple roles/occupations as is mentioned and the cypher query is easier to understand.

elena · ‎09-05-2019

Hiya,

I know that this is a late response, but it's odd to me that this answer doesn't refer to this excellent piece of documentation:

Quote:

I ran a query against each database 100 times and then took the 50th, 75th and 99th percentiles (times are in ms):

Using a generic relationship type and then filtering by end node label
50%ile: 6.0    75%ile: 6.0    99%ile: 402.60999999999825
 
Using a generic relationship type and then filtering by relationship property
50%ile: 21.0   75%ile: 22.0   99%ile: 504.85999999999785
 
Using a generic relationship type and then filtering by end node label
50%ile: 4.0    75%ile: 4.0    99%ile: 145.65999999999931
 
Using a specific relationship type
50%ile: 0.0    75%ile: 1.0    99%ile: 25.749999999999872

My Summary:

Good: (25.7)
99%ile: 25.749999999999872 Total database accesses: 10,002
(:Person)-[ :HAS_EYES ]→(:Attr {colour:"blue"})

Worse: end node label (145.6)
99%ile: 145.65999999999931 Total database accesses: 70,001
(:Person)-[:HAS]→(:Attr :Eyes {colour:"blue"})

Pretty Bad: end node property (402.6)
99%ile: 402.60999999999825 Total database accesses: 140,001
(:Person)-[:HAS]→(:Attr {type:"eyes", colour:"blue"})

Very Bad: relationship property (504.8)
99%ile: 504.85999999999785 Total database accesses: 140,001
(:Person)-[:HAS {type:"eyes"} ]→(:Attr {colour:"blue"})

NB: I refer to this often as I have been learning, I think it's a terrific summary! Though it's relatively old, and I wonder if these algos have been updated. it'd be nice to rerun these some time, ping @mark.needham

mike_r_black · ‎09-22-2019

There's also this video that is an excellent watch that is a great explanation of how to model your graph to leverage the queries you'll be writing Secret Sauce of Neo4j: Modeling and Querying Graphs

dominicvivek06 · ‎09-22-2019

My blog on the performance
http://www.dominickumar.com/blog/neo4j-relationship-modelling-performance/

mithun_das · ‎01-23-2020

It looks like
(:City)-[:TRANSLATION { code: 'lang.code'}]->(:CityTranslation) is the Very bad performer...
so
(:City)-[:TRANSLATION]->(:CityTranslation { code: lang.code} ) will perform better for me
is it?
I was thinking of using the lang.code as a dynamic relationship but then there can be too many languages... in our use case user inupts data with language code (we provide the list of language codes)

Kailash · ‎01-06-2020

My view will be to go with the Relationship as type... a Data modeling and a execute plan will help though..

Neo4j

Is it better to have many different relationship types or one relationship with properties?