Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-29-2021 04:30 AM
Hi folks,
I'm trying to build a solution that allows for BIM (building information model) analysis.
The problem that I'm facing is a request performance over a decent (100M) amount of elements, where every element has up to 100 properties out of 16k uniq properties.
Below you could find more detailed description of the problem, as well as experiments with Neo4j I already did, and requests I'm struggling with and what I'm trying to achieve.
Maybe someone could share their experience with tasks alike. I would appreciate any ideas/suggestions.
Basically BIM is a DAG with a set of properties attached to every node.
Every BIM evolves over the time (new elements added, some are deleted, some change their values or place in the hierarchy) and I need to track every change, so I could rewind a model to any state in the past.
On a top of the BIM I have a user-defined hierarchies (UDH) of elements, that could link together nodes across multiple BIMs, from different levels. For example, it could be a set of walls collected across 10 buildings, grouped by a floor number.
In the future every UDH should be able to assign additional properties for model nodes, groups of nodes, or elements from other UDHs.
The solution makes complex analytical queries on these structures, like "calculate total volume of elements with specific property, that are included into UDH Froo and UDH Bar", or "get all uniq values of property XYZ across all elements of UDH Foo".
Right now I'm using a custom solution based on the relational database, to represent this data. And it is kind of cumbersome, as it requires to build extremely complex queries to manipulate the data.
It seems like Neo4j could drastically simplify the representation of BIMs and UDHs. But I'm concerned with the performance, the data sets are pretty large: every version of BIM contains around 300K nodes with 100 properties each, with total amount of uniq properties per BIM around 2k. I researched on this topic, and it seems like Neo4j is not quite good at large amount of properties, it's discussed in this topic, for example Best practices on number of properties for a node. Also it was stated that Neo4j makes linear search among node properties, which could result in slow queries that look for element with specific property.
I made a prototype, where uploaded several models to the Neo4j db and benchmarked requests. Request that collects all unique property names across all BIM nodes, where each node has 160-240 properties out of 500 uniq props, took
Time was growing linearly, and on the last test db just stoped responding.
Right now I'm thinking of some hybrid solution, that could use column db like Vertica/ClickHouse/HBase to store properties, which should fit pretty good according to the sparse nature of properties, and Neo4j to store relationships between nodes.
For the reference, there is what I'm trying to achieve:
Source data:
Example requests:
Timing:
Additional requirements:
03-31-2021 05:47 AM
Hi @DimGun ,
I keep bookmarking this post for reply, and haven't had time for a thoughtful response.
Brief notes to consider:
-ABK
04-06-2021 09:27 AM
Hi Andreas,
thanks for the reply! I read through versioner-core documentation, pretty interesting approach to represent relationships between versioned entities, should really give it a try, when we would resolve the issue with the number of properties.
All the sessions of the conference are now available online