cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

IoT DataModel for Sensor data - Time-series

Dear All,
Wishing you all a Very Happy New Year!

Quick one..Recently Ive completed a Data Model for Industrial IoT use case where in ive modeled the Asset Hierarchy as well sensor data . Time series data for every min from around 100 sensors from the Asset (Plant).
My peers are arguing why i need Graph for Time-series. But I firmly believe using Timeseries with Transaction, I can help my customer to find specific faulty part/component among 100 plant at a specific time with low latency as well when i have a History of 20 years sensor data, using my Timeseries Data model i can retrieve my data so quick .

Basically ive created a Node 'SensorValue' which has timestamp and all sensor values (assume 5 sensors values as properties) but ive created Year, Month , Day, Hr and Minute Node as well and attached to SensorValue
(: MINUTE) <-[:REC_MIN]-(sv:SensorValue {Timestamp:"01-01-2019 01:01" , S1Value, S2Value})

(: HOUR) <-[:REC_HR]-(sv) , (: Minute) <-[:REC_MIN]-(sv)
(: Day) <-[:REC_DAY]-(sv)

May have Hr linked with day too but my point is when SensorValue tagged to Time parts, whatever query i feel it wud easy to traverse to specific time slice say
MATCH ( :SensorValue ) -[:REC_DAY]->( :Day {day:17} ) will get sensor data recorded on 17th Day..if i want to narrow down we can filter using YEAR value .

Ive tried for subset of data and it works fine. But would like to know any input/suggestion will it be any better way to do. I was thinking to use fan-out method to partition the nodes to avoid many connection directly connected to 'main Node" if any performance specific question too.

Now AWS announced Time-series database for fast retrieval of time-series ..i feel the principle behind the data-model wud be similar like what im talking here.(since Im crazy in Neo4j)

Please let me know if any one of you tried Data Model for Timeseries and any best practices
and comments or suggestion on my data model.

the problem statement is like "
Get me all the temperature (property) and pressure values (property) on particular day btw 4 pm to 6 pm ( since it was extreme weather condition .) among 5 years of data (record for every 1 min)

Thanks!

Best Regards,
Senthil Chidambaram

4 REPLIES 4

Hi @senthilc78

I am too on a similar quest but a bit late

I've just started to envision a data model and came across your post. One thing which i was contemplating to keep the sensor data separate in another database like a key:value pair database.

Please let me know if you have figured out anything,

Hi Mangesh ! Excuse me for my delayed response . Hope you wud have figured out by now.
What i got here like Basically used document based DB and while ingesting the data , i flagged new property for the sensor(s) when the value crossing some threshold (as you know mostly sensor value keeps same value and when it got cross its upper/lower limit )based on flag created sub-set of graph (new model) and used for contextual insights.. So we no need to check on overall population but only t he subset- Influencer or Outliers 🙂

With Smiles,
Senthil Chidambaram

For any timeseries database, my first preference is (1) Cassandra and then (2)MongoDB.
Cassandra -> Advanced Time Series with Cassandra | Datastax
MongoDB -> Use bucket pattern -> Building with Patterns: The Bucket Pattern | MongoDB Blog
Redis -> does have a key-value, but usually its used as a cache for storing intermediate results for application.

There are 4 kinds of NoSQL Data Model -> Document, Columnar, Key-Value and Graph.

With Cassandra and MongoDB they use Hash algorithm to fetch your keys quicker. (and also partitioned)

Hope this helps.

Some Key  pointers & updates:
1.  Here in my problem statement, ive used aggregated Sensor value (min/max,avg , deviation) per day wise and get it stored in Neo4j and please note , not storing every second 'Telemetry' data to Neo4j  since Neo4j is not meant for though it wont restrict you to store but it wud occupy more storage .

2. I did capture only the deviation (low/high ) and got the aggregated value for each type of sensor and connected to get the contextual insights 

STEP 1: Build your Asset Hierarchy in Neo4j Graph Model - Instantiate through Cypher batch jobs 
STEP 2 :  Aggregate the Sensor values (outside Neo4j)  for each device per day and only for some critical devices , took  hrly aggregated sensor values and link with Day/Hr/Assets  nodes but  to avoid 'Super dense node' problem
ive managed with multiple relationship types by day wise/week wise/month wise 

Take away:
1. In Simple Neo4j is best fit when there is many to many relationship and  connecting different Entities/Things (Nodes)
and wants to traverse at 10..25..nth level from Top Nodes to Leaf nodes  to get contextual Insights/patterns for better decision and predictions 

2. Neo4j is not meant for storing just Telemetry data though you can actually store but other TS or Document db would be better for this purpose 

3. Aggregated Metrics/Sensor values, we can take into Neo4j and definitely it would help us to get some known to hidden patterns when we connect different logical entities just not with only Telemetry data

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online