What Is Time-series Data?
What is this "time-series data" that we keep talking about, and how and why is it different from other data?
什么是时间序列数据,这些数据和其他数据有什么区别呢?
Many applications or databases actually take an overly narrow view, and equate time-series data with something like server metrics of a specific form:
许多应用程序或数据库实际上采取过于狭隘的观点,将时间序列数据等同于特定形式的服务器度量:
Name: CPU
Tags: Host=MyServer, Region=West
Data:
2017-01-01 01:02:00 70
2017-01-01 01:03:00 71
2017-01-01 01:04:00 72
2017-01-01 01:05:01 68
But in fact, in many monitoring applications, different metrics are often collected together (e.g., CPU, memory, network statistics, battery life). So, it does not always make sense to think of each metric separately. Consider this alternative "wider" data model that maintains the correlation between metrics collected at the same time.
但事实上,在许多监控应用程序中,常常将不同的度量集中在一起(例如CPU、内存、网络统计信息、电池寿命)。所以,将所有的度量分开考虑并不总是有效的。考虑这个可替代的“更广泛”的数据模型,它维护同一时间收集的度量之间的相关性。
Metrics: CPU, free_mem, net_rssi, battery
Tags: Host=MyServer, Region=West
Data:
2017-01-01 01:02:00 70 500 -40 80
2017-01-01 01:03:00 71 400 -42 80
2017-01-01 01:04:00 72 367 -41 80
2017-01-01 01:05:01 68 750 -54 79
This type of data belongs in a much broader category, whether temperature readings from a sensor, the price of a stock, the status of a machine, or even the number of logins to an app.
这种数据类型隶属于一个更广泛的范畴,比如传感器读取的温度,股票的价格,机器的状态或者一个应用的登录人数。
Time-series data is data that collectively represents how a system, process, or behavior changes over time.
时间序列数据是随着时间的推移而收集的系统变化、进程变化或者行为变化。
时序数据的特点
If you look closely at how it’s produced and ingested, there are important characteristics that time-series databases like TimescaleDB typically leverage:
- Time-centric: Data records always have a timestamp.
- Append-only : Data is almost solely append-only (INSERTs).
- Recent: New data is typically about recent time intervals, and we more rarely make updates or backfill missing data about old intervals.
如果你仔细研究一下时序数据的产生和摄取,你就会发现时序数据库例如tiemscaleDB通常利用时序数据的如下特征:
- 时间为中心:数据记录通常都有一个时间戳。
- 只加:数据只能进行添加(插入)。
- 实时:最新数据都是关于最近时间段的数据,我们几乎不会更新数据,并且不会回填旧时间段的缺失数据。
The frequency or regularity of data is less important though; it can be collected every millisecond or hour. It can also be collected at regular or irregular intervals (e.g., when some event happens, as opposed to at pre-defined times).
收集数据的频率和规律并不重要;可以每一毫秒或者每一小时收集一次顺序,也可以定期或者不定期地收集数据(例如:以事件为基础收集数据而不是以时间为基础)。
But haven't databases long had time fields? A key difference between time-series data (and the databases that support them), compared to other data like standard relational "business" data, is that changes to the data are inserts, not overwrites.
时序数据(和支持他们的数据库)与其他数据,例如标准的关系型“商业”数据之间的关键区别是:时序数据改变数据的方式是数据写入而不是数据覆盖。
时序数据随处可见
Time-series data is everywhere, but there are environments where it is especially being created in torrents.
时序数据无处不在,但是时序数据
- Monitoring computer systems: VM, server, container metrics (CPU, free memory, net/disk IOPs), service and application metrics (request rates, request latency).
- Financial trading systems: Classic securities, newer cryptocurrencies, payments, transaction events.
- Internet of Things: Data from sensors on industrial machines and equipment, wearable devices, vehicles, physical containers, pallets, consumer devices for smart homes, etc.
- Eventing applications: User/customer interaction data like clickstreams, pageviews, logins, signups, etc.
- Business intelligence: Tracking key metrics and the overall health of the business.
- Environmental monitoring: Temperature, humidity, pressure, pH, pollen count, air flow, carbon monoxide (CO), nitrogen dioxide (NO2), particulate matter (PM10).
- 监控电脑系统:虚拟机、服务器、容器度量(CPU,内存,网络和磁盘IOPS),服务和应用指标(请求率,请求延迟)。
- 金融交易系统:古老的证券,新的密码货币,支付,交易活动。
- 来自工业机器和设备、可穿戴设备、车辆、物理容器、托盘、智能家居消费设备等传感器的数据。
- 事件应用:用户/客户互动数据点击流、页面、登录、注册等。
- 商业智能:跟踪关键指标和业务的整体健康。
- 环境监测:温度、湿度、压力、pH值、花粉计数、空气流量、一氧化碳(co)、二氧化氮(NO2)、颗粒物(PM10)。