Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-13-2021 12:07 AM
I have an idea for a data warehouse, but I don't know enough about graph technologies to know if it is a good idea or not.
The persistent historical stage stores everything exactly as it was in the source system, with new records being added with new timestamps and nothing deleted. That way if we improve our data quality (DQ) efforts, we can reprocess the old data.
We have a DQ and Master Data Management (MDM) tool that cleans all incoming data in standardized ways, resolves and merges identities from disparate sources, and loads the data as nodes and edges into the Enterprise Knowledge Graph (EKG). This represents things based on immutable characteristics (as best as we can) and does not take business rules into account. It is the raw data, but clean and centralized. If we have Bob as an employee, and he also buys our product, we would take his data from payroll and customer systems and create one unified Bob node, and then tag or add relationships as needed to indicate that he is both an employee and a customer.
My understanding is that graph databases are better than RDBMSs at "path between" queries. Also, the schema-lite structure is ideal for exploratory analysis. However, I have heard that for typical reporting RDBMSs are better. So we have a process that reads the EKG data, applies the business rules, and outputs standard data marts based on dimensional modeling. The reporting tools (Tableau, etc.) read from the data marts, but the data scientists have access to the EKG.
Is this a good use case for a graph database? Are there pitfalls I don't see? Are there people already doing this successfully? If so, how can I learn from them?
All the sessions of the conference are now available online