Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-21-2021 03:53 AM
We have a starting node (called VariantConfiguration) which has a list of variants. Each Variant has a list of PartListItems and have a list of AttributeOptions, which are defined by an AttributeDefinition and an Attribute. All we want is to load a filled up VariantConfiguration.
In our example there exists only one VariantConfiguration with 3 Variants. Every Variant have 1 PartListItem (overall: 3) and 3 AttributeOptions (overall: 9). So overall there are 3 different AttributeDefinitions and 6 different Attributes. If we load it by the following query there are 9 resulting rows:
MATCH
(config:VariantConfiguration {id : "60d3ccb0-58ff-4d0e-a25c-917fed59ff0a"})-[r_vc_v:FOUND]->(variant:Variant)
MATCH
(variant)<-[r_v_pli:PART_LIST_ITEM_REF]-(part_list_item:PartListItem)
MATCH
(variant)<-[r_v_a:ATTRIBUTE_OPTIONS]-(attribute_option:AttributeOption)
, (attribute_option)-[r_ao_a:HAS_ATTRIBUTE]->(attribute:Attribute)
, (attribute_option)-[r_ao_d:HAS_ATTRIBUTE_DEFINITION]->(attribute_definition:AttributeDefinition)
RETURN config, variant, part_list_item, attribute_option, attribute_definition
Seems ok with a small amount of data but if we have a VariantConfiguration with 148 variants and each have 17 AttributeOptions we will have a result set of 10171 records. The problem seems that a carthesian product has been created above PartListItem, Attribute und AttributeDefinition, so we understand that collect and WITH need to be added to the query (Which is surprisingly unreadable with a much more complex query):
MATCH
(config:VariantConfiguration {id : "60d3ccb0-58ff-4d0e-a25c-917fed59ff0a"})-[r_vc_v:FOUND]->(variant:Variant)
MATCH
(variant)<-[r_v_pli:PART_LIST_ITEM_REF]-(a_part_list_item:PartListItem)
WITH
collect(a_part_list_item) as part_list_item,
config, variant
MATCH
(variant)<-[r_v_a:ATTRIBUTE_OPTIONS]-(a_attribute_option:AttributeOption)
, (a_attribute_option)-[r_ao_a:HAS_ATTRIBUTE]->(a_attribute:Attribute)
, (a_attribute_option)-[r_ao_d:HAS_ATTRIBUTE_DEFINITION]->(a_attribute_definition:AttributeDefinition)
WITH
collect(a_attribute_definition) as attribute_definition,
collect(a_attribute_option) as attribute_option,
collect(a_attribute) as attribute,
part_list_item, config, variant
RETURN config, variant, part_list_item, attribute_option, attribute_definition
This query return 3 records for the VariantConfiguration with 3 variants and 148 records for the VariantConfiguration with 148 records. Fine so far! But OGM does not build our models and relations if we not return the relation variables. But if we do so the carthesian product will be created again, so also all realtions need to be wrapped with collection
So there might be one missing information or one point we do no unterstand. From our point of view the loading of some node with relation nodes should not be so much complicated and not so implicit. The real application will load some more nodes (overall 16) and relations, so we have to write a query which is at least almost 200 lines long just to load some nodes from the database - and almos unreadable. And another implicit fact is that ogm will generate queries, which does not include the collection notation, so already a call with a depth of two will result in a carthesan product with a huge amount of unnecessary data and in most cases with a broken application.
So. What point do we miss?
Thanks and best regards
12-21-2021 07:15 AM
Salut Simon,
I deal with hierarchical data a lot and there is no real silver bullet solution.
There are however some best practices that can be used when designing your graph schema that help when it is time to query it. Here is a link to a post I wrote on the subject.
In your case the best solution is to write your queries in the most legible way. Yes they may be a bit long, with several COLLECTS along the way, but it will be easier to follow and read.
First question: How are you ingesting the query results? I found it easiest to return them as a nested JSON payload.
Regardless here is how I would write the query:
Also I would not alias the relationships (or Nodes) unless you plan to return them.
In my experience this query makes the hierarchy a bit easier to follow in debug.
MATCH (config:VariantConfiguration {id : "60d3ccb0-58ff-4d0e-a25c-917fed59ff0a"})
OPTIONAL MATCH
(config)-[:FOUND]->(variant:Variant),
(variant)<-[:PART_LIST_ITEM_REF]-(a_part_list_item:PartListItem),
(variant)<-[:ATTRIBUTE_OPTIONS]-(a_attribute_option:AttributeOption),
(a_attribute_option)-[:HAS_ATTRIBUTE]->(a_attribute:Attribute),
(a_attribute_option)-[:HAS_ATTRIBUTE_DEFINITION]->(a_attribute_definition:AttributeDefinition)
// First rollup
WITH
config, variant, a_part_list_item, a_attribute_option,
collect(distinct properties(a_attribute)) as HAS_ATTRIBUTE,
collect(distinct properties(a_attribute_definition)) as HAS_ATTRIBUTE_DEFINITION
// Second rollup
WITH
config, variant,
collect(distinct
apoc.map.setLists(properties(a_attribute_option),
['HAS_ATTRIBUTE', 'HAS_ATTRIBUTE_DEFINITION'],
[HAS_ATTRIBUTE, HAS_ATTRIBUTE_DEFINITION]
)
) as ATTRIBUTE_OPTIONS,
collect(distinct properties(a_part_list_item)) as PART_LIST_ITEM_REF
// Third rollup
WITH
config,
collect(distinct
apoc.map.setLists(properties(variant),
['ATTRIBUTE_OPTIONS', 'PART_LIST_ITEM_REF'],
[ATTRIBUTE_OPTIONS, PART_LIST_ITEM_REF]
)
) as FOUND
// final object
WITH
apoc.map.setLists(properties(config), ['FOUND'], [FOUND]) as variant_configuration
RETURN apoc.convert.toJson(variant_configuration) as variant_configuration
I hope this helps.
01-11-2022 03:56 AM
Hy Sam,
happy new year and sry for my late reply!
We simply use OGM as GraphMapper, so it is automatically mapped. That is why we have discovered the large amounts of data only by chance.
Thanks for your example and your blog. Sounds really helpful! I will also share it internally.
But I am still confused why it is so complicated to "simply" load nodes with relations. From my point of view this is one of the core features of neo4j-ogm and neo4j itself (But I also thought cascades should be an core feature), but it seems that it is how it is and we have do deal with :). But, to be honest, I'm not sure if we will use a graph database in the future - if there is no missing point in our understanding of how to use neo4j/ogm and cypher.
Thanks and best regards
Simon
All the sessions of the conference are now available online