Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-27-2020 12:29 PM
Hi there,
I got in an argument with someone that wants to add more links the the schema.
His argument is that it's faster if the entity is linked directly to the entity I want to search.
My argument is that it is better to have the most natural model possible, that mostly resemble reality. and that even if an entity is 4 link deeper, it will be fast since traversing node and their linked nodes is linear. Also, making the Cypher query will be closer to natural language and easier to build. + I don't want to maintain 'fake' link just for performance reason.
Am I wrong? Is adding link that are not natural a good way to link entity and have faster queries?
Example is for a virus on a computer
My model
User -> [HasFile] -> File -> [Contains] -> Virus
His model that is challenging mine
User -> [Has] -> Virus (I don't agree with this one)
User -> [Is_Owner] -> File
File -> [Has] -> Virus
Some Use case that the graph needs to answer:
04-27-2020 01:01 PM
Hi
User -> [Has] -> Virus
User -> [Is_Owner] -> File
File -> [Has] -> Virus
If it was this few relationships, I would agree with your simple model.
If "File -> [Has] -> Virus" has changed, "User -> [Has] -> Virus" must also be changed. This is because it requires dual management.
But if I have a lot of relationships, such as network devices, I'll make each relationship with the service layer, the logic layer, and the physical layer.
04-27-2020 01:21 PM
Maybe I was not clear, but I want to know the difference in complexity from the last 2 use-case query.
Is one model more performant than the other?
I find adding a link between User and Virus to be hard to understand and hard to maintain on my side. A user does not have a virus, it's the file that contains the virus, the user has it 'indirectly', so I would not add a direct link there.
1- Match (user:User)-[Has]-(File)-[Has]-(virus:Virus) where virus.hash= '123' return user
vs
2- Match (user:User)-[Has]-(virus:Virus) where virus.hash= '123' return user
Any difference in performance? Thanks
I had in mind the graph finding algorithm does not need to traverse the whole Tree from the top (User), but instead will find the match first (Virus) and go up the tree to return the requested element.
04-28-2020 08:28 PM
If the goal is to query with a user name and get to the virus they have, I think your buddy is going to be right to have the direct relationship. Now is that a versatile model and possible a headache to maintain, possibility yes.
@maxdemarzi has one of my favorite videos to watch and watch over again about understanding how you moved can affect query performance: Secret Sauce if Neo4j
04-29-2020 06:05 AM
Thanks @mike.r.black for your reply.
Ill check the video to understand better.
To add to the original problem, only "Files" entity with Virus are stored in the model.
For the time complexity of the matching algorithm on finding the Virus of a User, It will need to be linear with the number of File that a user has? So If a user as 10 file, there will be 10 link to pass by to find their File, then 10 link to pass to find their Virus (leaf node), On the proposed model with direct link to Virus, there will also be 10 links to pass correct? Is passed through a secondary entity that costly? For me, that is still O(NumberOfFile) in both case
All the sessions of the conference are now available online