cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Repository Save/Find Depth

scott2
Node Link

Are the depth fields going to be reintroduced to the Repository objects?

i.e. Repository.save( Ojbect object, int depth)

Or is there a newer, cooler way of facilitating this that I am not aware of?

5 REPLIES 5

Currently there is no plan to introduce this feature 1:1 like it was before:
Save would save all reachable nodes from the object(s) that should get persisted.
For relationships that do not form some kind of cycle, load will currently load all modelled data with infinitive depth. If it hits a combination of (:StartType)-[:RelType]->(:EndType) more than the currently introduced virtual limit of 5, the query creation will stop for this "branch".
The limit right now is fixed and we are still evaluating how to make it configurable and where we can improve the logic. We already found some edge-cases where a (real bad) modelling could either put a lot of load on the database or not returning all wanted relationships. After we found the right balance there we would probably look into the configuration of the depth.
I think that the differentiation between the not repeated relationships and the one "running in circles" could really improve the usage and hopefully lowers the need for a query-specific depth.

Does it consider RELATIONSHIP direction at all?

It would be cool if there was a way to describe common subgraphs in the data. I don't know how you would achieve this via annotations. I guess Queries will have to be customized to get exactly what you want from a subgraph structure.

I think that it does make perfect sense to save based on the depth of the object being saved.

Would be nice to get more info on this one, how to hack it, config it anything to control flow a bit more. I have a major performance impact since I don't know how to disable fetching Relationships. Anyways I need to fetch either 1 level only or disable relationship fetching. I have a LOT of relationships on 1st level and have a lot of items in scope of 5 levels in hierarchy so fetching just 1 record for me would be getting 3 types of data on each depth of 1,2,3,4 and finally 5th level is just record itself. If I understood the virtual limit. I can currently see it goes 3 levels deep and on 1st level I have 2 relationships so it grabs 4 types of data for now. Wonder what would happen if I had filled out data on 2nd and 3rd level. Would it grab extra data on those? Sample graph would be:

Left side represents data from relationship that has no depth

              (1st)
        	 /      \        Right side can go up to hundereds of records which have same structure as 1st
       (2nd)...    (2nd)               (2nd)
                   /  \                 /  \
               (3rd) (3rd)           (3rd) (3rd)  //same for 2nd level
                      / \                   / \   
                     (4th)(4th)           (4th)(4th)  //... goes on
                           / \                   / \
                        (5th)(5th)              *   * -- gues SDN/RX stops here
                              / \   
                              *  *

Having depth disabled completely would still be nice so I can query missing stuff with another query. 2 queries in my case would work faster than one grabbing data not relevant to me at all in huge ammount.

The implemented solution is now (RC01):
All relationships gets mapped as they are defined in the domain. We want to keep the basic ideas of domain-driven design in the framework. This means creating the right-sized aggregates and access them through a suitable aggregate root.
As I explained a month ago, due to the nature of graphs, we introduced a magic limit of repetitive pattern (same node types cycles). This is 2 at the moment of writing.
Basically this is some kind of a match between performance and convenience like person.getFriends().getFriends() where friends are also of type Person.

Another reason why we have chosen to go this way is to not handle "subgraphs" on the application side. This means having just an excerpt of the real mapped domain. As a consequence we would have to keep track of the loaded horizon/depth/relationship count/you name it to avoid overriding existing data on the edges.
An example is an empty collection on a subgraph's edge node: is the collection empty because it wasn't loaded before or was it cleared and the user's intention is to remove all the relationships?
Those kind of problems come up if we would introduce the freedom to only fetch "complete graphs" in general.

There is always the option to opt-in for custom queries via the @Query annotation, for example, to get the data you need to load. But keep in mind that persisting the results of this query directly might ignore some relationships.

I don't think I fully understand this or maybe you didn't understand why loading/saving automatically can be unexpected more than helpfull.

My problem is that I don't want to save relationships at all automatically and here's what I'm experiencing and probably other people will to:

  • Getting something by ID just to update it will get too much data and I need just a single record.
  • Trying to store new info provided by user will delete all related nodes because users don't send relationships which is expected way for a user to do. Having user send 100 records to update root one is just not a good way to go.
  • With Kotlin @Transient doesn't work on data class constructor parameter (beta04 can't test rc1 won't even find repository beans). If I put list as member in data class it needs to be transient, I need to think of it to set it again after doing .copy() because of the transient issue and put it as var + nullable for faster performance than val + mutable list as it's easier to replace reference than to copy lists example: 1000 nodes in list and I need to replace all of their child objects by copying lists... I could also put children as var + nullable and mutable list so modifying existing list is easier than creating new one with new record or new one with removed record.

So there's many issues with other parts of technology that makes bugs with current approach and unexpected behaviours are frequent in my scenario as I'm not using it the way some might want me to.

I believe that in time many of the users will struggle with this approach as most of them won't expect storing one node will delete related nodes because they didn't set relationships back and query they used was custom to speed up things and didn't include unnecessary data for given context like relationships .
A lot of people stopped using EF (.net) and EF core because it was too unpredictable to use for same reason of storing data, deleting it etc.

Anyways it might be that Neo4j is just not a good fit for this and I should research other DBs. I took it because it looked great for storing hierarchical data which could be found fast by searching from parent down to child as structure was quite known.

EDIT: just tested it on a new project with RC1 it only removes relationships now if updating while having @Relationship field set to empty list or null. Not sure how I ended up without nodes previously it's possibly my bug. Anyways it still stays that updating where relationship node are not fetched will remove them