结构&理念

TimescaleDB is implemented as an extension on PostgreSQL that provides hooks deep into Postgres' query planner, data model, and execution engine. This allows TimescaleDB to expose what look like regular tables, but are actually only an abstraction or a virtual view of many individual tables comprising the actual data.

TimescaleDB作为PostgreSQL的扩展,可提供钩子深入到Postgres的查询规划器、数据模型和操作引擎中。这使得TimescaleDB可以接触一些看起来比较普通的表,但是这些表实际上是包含实际数据的单个表的抽象或虚拟视图。

This single-table view, which we call a ​hypertable​, is thus comprised of many ​chunks​. Chunks are created by partitioning the hypertable's data in either one or two dimensions: by a time interval, and by an (optional) "partition key" such as device ID, location, user id, etc. We sometimes refer to this as partitioning across "time and space".

这个单表视图也被称为一个hypertable​,它包含很多区块。区块是由一个或两个维度---某个时间间隔,和一个(可选)”分区键”,如位置,设备ID,用户ID等---划分Hypertable的数据创建而成的。我们有时也将区块称为“时间和空间”的分割。

术语

Hypertables

The primary point of interaction with your data is a hypertable, the abstraction of a single continuous table across all space and time intervals, such that one can query it via vanilla SQL.

数据交互的关键部分是一个hypertable--跨越时间段和空间的连续表的抽象,因此可以通过vanilla SQL进行查询。

Virtually all user interactions with TimescaleDB are with hypertables. Creating tables and indexes, altering tables, inserting data, selecting data, etc. can (and should) all be executed on the hypertable. [​Jump to basic SQL operations​]

几乎所有的TimescaleDB用户交互都是与hypertables的交互。因为创建表和索引、修改表、插入数据、选择数据等都是(也应该都是)在hypertable上执行的。【跳到基础的SQL操作】。

A hypertable is defined by a standard schema with column names and types, with at least one column specifying a time value, and one (optional) column specifying an additional partitioning key.

Hypertable是由列名称和类型的标准模式定义的,它至少包含一个指定时间值的列,和一个(可选)指定额外分区键的列。

TIP:See our ​data model​ for a further discussion of various ways to organize data, depending on your use cases; the simplest and most natural is in a "wide-table" like many relational databases.
提示:查看我们的数据模型,以便根据您的用例进一步讨论组织数据的各种方法;最简单、最普遍的的方法是像许多关系数据库一样将数据组织在“宽表”中。

A single TimescaleDB deployment can store multiple hypertables, each with different schemas.

一个单一的Timescale DB分布可储存多个hypertables,每一个hypertable有不同的模式。

Creating a hypertable in TimescaleDB takes two simple SQL commands: ​CREATE TABLE​ (with standard SQL syntax), followed by ​SELECT create_hypertable()​.

在timescaledb创建Hypertable需要两个简单的SQL命令:​创建表​(使用标准的SQL语法),​然后选择create_hypertable()​。

Indexes on time and the partitioning key are automatically created on hypertables, although additional indexes can also be created (and TimescaleDB supports the full range of PostgreSQL index types).

虽然额外的索引可以手动去创建,但是时间索引和分区索引在hypertable中都是自动创建的。(TimescaleDB支持所有的PostgreSQL索引类型)。

区块

Internally, TimescaleDB automatically splits each hypertable into ​chunks​, with each chunk corresponding to a specific time interval and a region of the partition key’s space (using hashing).

TimescaleDB内部自动将每一个hypertable分割成不同的区块,每一个区块对应一个具体的时间段和分区键空间区域(使用哈希)。

These partitions are disjoint (non-overlapping), which helps the query planner to minimize the set of chunks it must touch to resolve a query.

这些分区互不相交(不重叠),这将查询规划器处理一个查询时需要涉及的的区块数量降到了最低。

Each chunk is implemented using a standard database table. (In PostgreSQL internals, the chunk is actually a "child table" of the "parent" hypertable.)

每个块使用一个标准的数据库表实现。(在PostgreSQL内部,区块实际上是一个“母”Hypertable的“子表”。)

Chunks are right-sized, ensuring that all of the B-trees for a table’s indexes can reside in memory during inserts. This avoids thrashing when modifying arbitrary locations in those trees.

大小合适的区块,确保表中索引的所有B-tress(多路搜索树)在数据插入过程中储存在内存中,防止修改树种任意元素位置时发生坍塌。

Further, by avoiding overly large chunks, we can avoid expensive "vacuuming" operations when removing deleted data according to automated retention policies. The runtime can perform such operations by simply dropping chunks (internal tables), rather than deleting individual rows.

Single Node vs. Clustering

TimescaleDB performs this extensive partitioning both on ​single-node​ deployments as well as ​clustered​ deployments (in development). While partitioning is traditionally only used for scaling out across multiple machines, it also allows us to scale up to high write rates (and improved parallelized queries) even on single machines. The current open-source release of TimescaleDB only supports single-node deployments. Of note is that the single-node version of TimescaleDB has been benchmarked to over 10-billion-row hypertables on commodity machines without a loss in insert performance.

Benefits of Single-node Partitioning

A common problem with scaling database performance on a single machine is the significant cost/performance trade-off between memory and disk. Eventually, our entire dataset will not fit in memory, and we’ll need to write our data and indexes to disk.

Once the data is sufficiently large that we can’t fit all pages of our indexes (e.g., B-trees) in memory, then updating a random part of the tree can involve swapping in data from disk. And databases like PostgreSQL keep a B-tree (or other data structure) for each table index, in order for values in that index to be found efficiently. So, the problem compounds as you index more columns.

But because each of the chunks created by TimescaleDB is itself stored as a separate database table, all of its indexes are built only across these much smaller tables rather than a single table representing the entire dataset. So if we size these chunks properly, we can fit the latest tables (and their B-trees) completely in memory, and avoid this swap-to-disk problem, while maintaining support for multiple indexes.

For more on the motivation and design of TimescaleDB's adaptive space/time chunking, please see our ​technical blog post​.

results matching ""

    No results matching ""