Practical Introduction to Time Series Databases and Time Series Data
1. What is Time Series data?
Time series data is a series of values where the time information (timestamp) corresponds to data recorded or an event that happened at that point in time. The data can have both regular and fluctuating intervals. For example, temperature recordings would be recorded at a set time, every minute or hour, while a stock price would be recorded every time a trade is completed.
Time Series data is usually a log or “append-only”, records are rarely updated except upon deletion when they expire. You wouldn’t update values in your web server logs and the same holds true for most Time Series data collections.
Depending on the latency required, individual records can be inserted one at a time but are often inserted in bulk. Messaging systems such as Kafka can queue individual records to be inserted in bulk or client applications can be cache configured to insert once a certain period.
2. What are Time Series Databases?
Simply put, a Time Series database is a database that specializes storing and querying time series data. While it’s possible to store and query in other relational and NoSQL databases, a Time Series database will have specialized time-related functions such as:
- Time Weighted Averages: When time records are inserted at irregular intervals, normal AVERAGE aggregations will return incorrect results. GridDB’s TIME_AVG function provides this feature.
- Time Downsampling: In many applications, Time Series data is recorded at very high resolution but is often only needed to be queried at a lower resolution, for example to populate data in a graph. With a GridDB’s TIME_SAMPLING function, data will be returned at the requested interval and if there isn’t an exact match to particular interval’s time stamp, metrics will be interpolated between the timestamp before and after.
- Easier comparison operators: Time Series databases allow you to compare timestamps in multiple ways, not just one simple function call with the comparative timestamp as a string.
- More effective compression: Since a Time Series database knows the key values are timestamps it can more effectively compress and index the data it stores. GridDB’s storeCompressionMode option enables compression, offering up to a 3x reduction in storage required.
- Automatic data expiration: As time goes on, old data no longer holds value or it no longer becomes necessary to be stored. To address this, GridDB and most other Time Series databases have functions to enable the database to automatically prune data that is older than a set time in a rolling fashion.
- Insert/Append Optimized: While Time Series data can be updated, it is much more common for new data to be inserted so many Time Series databases will use a log or transaction based data storage backend.
3. What Applications Use Time Series Data?
It can be argued that nearly all data could be Time Series data. For example, you may only care about the current price of a product you wish to buy, but it could also be useful to see historical prices for that product to decide if it’s worth waiting for it to go on sale. Thus product information which would typically be stored in a document oriented database is at least partially suitable for storage in a Time Series database.
- Internet of Things
- The Internet of Things (IoT) may be the most talked about user of Time Series data with millions of devices recording sensor data. Typical examples include temperature or producing environmental events such as doors locking and unlocking. With the vast quantity of data being recorded, it’s imperative to use a Time Series database to efficiently store, process, and analyze the incoming data. GridDB’s Key-Container data model works especially well for IoT workloads, where Time Series data for each device is stored in an individual container.
- Financial Transactions
- A ledger might be the original Time Series database, tracking credits, debits, and balances over time. Receipts or sale histories all have metrics Stock trades also fit the Time Series data model well with both the most recent value and all previous trade prices and volume all having significance.
- Application Monitoring
- Server logs are one of the simplest, most obvious examples of Time Series data but there are many different types of application monitoring is used. Telemetry uploaded by a mobile application can be stored in a Time Series database so developers can examine usage patterns to improve their application. Server metrics can be monitored to see peak and trough usage allowing system administers to not only better plan capacity, but also look for anomalies.
4. How is Time Series Data Used?
We’ve talked about why you need a Time Series database and what Time Series data is used for…
- Billing/Reporting: Often using aggregation functions that compute the SUM or COUNT of a metric. reports or bills are generated at regular intervals.
- Visualization: A plot of a metric over time is one of the most common places we see Time Series data on a day to day basis, whether it’s data from the stock market or the weather.
- Alerting: In monitoring applications, if a metric exceeds a threshold an alert can be sent over the prescribed messaging system. One common example of this is in Industrial IoT where machine sensors are monitored so that maintenance staff can be alerted if their values are outside of normal operating thresholds.
What is GridDB?
Toshiba GridDB™ is a highly scalable NoSQL database best suited for IoT and Big Data.
We live in the era of the Internet of Things (IoT) where billions of devices are interconnected and are generating petabytes of data at an increasing rate. Gaining insights and information from that data and generating value out of it gives a tangible competitive advantage to businesses, organizations, governments, and even individuals.
Organizations should focus more on creating value from data that will enhance their core products, services or even operational processes rather than spend time in dealing with the complexity surrounding Big Data. Big data, in this case, means data in large quantities, high frequencies, and vast varieties.
GridDB is an innovative solution built in Toshiba to solve these complex problems. The foundation of GridDB’s principles is based upon offering a versatile data store that is optimized for IoT, provides high scalability, is tuned for high performance, and ensures high reliability.
Four Pillars of GridDB
- Optimized for IoT
- High Performance
- High Scalability
- High Reliability/Availability
1. Optimized for IoT
GridDB’s Key Container data model and Time Series functions are built for IoT
The Key Container data model of GridDB extends the typical NoSQL Key-Value store. The Key Container model represents data in the form of collections that are referenced by keys. The key and container are rough equivalents of the table name and table data in Relational Databases (RDB). Data modeling in GridDB is easier than with other NoSQL databases as we can define the schema and design the data similar to that of an RDB.
The Key Container model allows high speed access to data through Java and C APIs. Data in GridDB is also queried through TQL, a custom SQL-like query language. Basic search through the WHERE command and high speed conditional search operations through indexing offer a great advantage for applications that rely on faster search. GridDB supports transactions, including those with plural records from the application. Transactions in GridDB guarantee ACID (Atomicity, Consistency, Isolation, and Durability) at the container level.
Two types of containers are prominent in GridDB: Collection-Container, a general-purpose container; and TimeSeries-Container which is for managing time series data.
TimeSeries-Container is apt for IoT scenarios where the data is associated with a time-stamp. GridDB supports numerous time-series functions such as
- Data compression, for ever-increasing time series data. This functionality reduces memory usage significantly compared to other DBMS
- Term release, to automatically delete records that are no longer valid or needed
- Time series data aggregation and sampling functions
2. High Performance
GridDB’s hybrid composition of In-Memory and Disk architecture is designed for maximum performance
I/O is a common bottleneck in any DBMS that can cause the CPU to be under-utilized. GridDB overcomes this bottleneck with the ‘Memory first, Storage second’ structure where the ‘primary’ data that is frequently accessed resides in memory and the rest is passed on to disks (SSD and HDD). High performance is achieved in GridDB by:
Prioritizing In-Memory processing – In scenarios with large amounts of data, GridDB localizes the data access needed by applications by placing as much ‘primary’ data in the same block as possible. Based on the application’s access pattern and frequency GridDB efficiently utilizes memory space by setting hint memory intensity function and thus reduces memory misses.
Reducing the Overhead – Operational and communication overhead occurs in multi-threaded operations due to lock and synchronization. GridDB eliminates this by allocating an exclusive memory and DB file to each CPU core / thread. As a result, execution time gets shortened and better performance is achieved.
Parallel Processing – GridDB achieves high performance through parallel processing within a node and across nodes. Parallel processing across nodes is done by distributing a large dataset among multiple nodes (partitioning). Parallelism is made possible by the event-driven engine which processes multiple requests using the least amount of resources.
3. High Scalability
GridDB scales linearly and horizontally on commodity hardware maintaining excellent performance
Traditional RDBMS are built on Scale-Up architecture (add more capacity to existing server/node). Transactions and data consistency are excellent on RDBMS. On the other hand, NoSQL databases focus on Scale-Out architecture (add smaller nodes to form a large cluster) but fair poorly on transactions and data consistency.
GridDB scales out horizontally with commodity hardware maintaining the same level of performance. Contrary to other scale-out NoSQL databases, GridDB offers strong data consistency at the container level and provides ACID transaction guarantees similar to that of an RDB. Proprietary algorithms of GridDB allow nodes to be added on the fly online without having to stop the service or operation. GridDB offers a dual advantage for businesses that need a scale-out database for large amounts of data but still want to maintain data consistency.
4. High Reliability/Availability
Hybrid cluster management and high fault-tolerant system of GridDB is exceptional for mission-critical applications
Network partitions, node failures, and maintaining consistency are some of the major problems that arise when data is distributed across nodes. Typically, distributed systems adopt ‘Master-Slave’ or ‘Peer-to-Peer’ architectures. Master-Slave option is good at maintaining data consistency but a master node redundancy is required to avoid having a Single Point of Failure (SPOF). Peer-to-Peer, though avoids SPOF, has a huge problem of communication overhead among the nodes.
GridDB’s autonomous control cluster architecture integrates the advantages of and overcomes the disadvantages of both Master-Slave and Peer-to-Peer styles. GridDB’s algorithms select the master node automatically among peers, and, in case of master node failure, operations remain intact as a new master is appointed automatically and immediately. GridDB’s proprietary algorithms avoid the classic distributed computing problem of Split-Brain, which occurs due to cluster partition during network failures. GridDB also offers various levels of replication based on the availability requirements of the application.
Overall, GridDB offers multiple reliability features for mission-critical applications that require high availability and data retention.
When it comes to IoT and Big Data use-cases, GridDB clearly stands out among other databases in the Relational and NoSQL space. Toshiba’s customers in various industry verticals have successfully implemented IoT projects by harnessing the power of GridDB.
5. GridDB performance benchmark
GridDB and MariaDB
GridDB and InfluxDB
GridDB and Cassandra
For more technical details, plsease go to website: https://griddb.net/en/