为什么选择TimescaleDB而非NoSQL?
Compared to general NoSQL databases (e.g., MongoDB, Cassandra) or even more specialized time-oriented ones (e.g., InfluxDB, KairosDB), TimescaleDB provides both qualitative and quantitative differences:
- Normal SQL: TimescaleDB gives you the power of standard SQL queries on time-series data, even at scale. Most (all?) NoSQL databases require learning either a new query language or using something that's at best "SQL-ish" (which still breaks compatibility with existing tools).
- Operational simplicity: With TimescaleDB, you only need to manage one database for your relational and time-series data. Otherwise, users often need to silo data into two databases: a "normal" relational one, and a second time-series one.
- JOINs can be performed across relational and time-series data.
- Query performance is faster for a varied set of queries. More complex queries are often slow or full table scans on NoSQL databases, while some databases can't even support many natural queries.
- Manage like PostgreSQL and inherit its support for varied datatypes and indexes (B-tree, hash, range, BRIN, GiST, GIN).
- Native support for geospatial data: Data stored in TimescaleDB can leverage PostGIS's geometric datatypes, indexes, and queries.
- Third-party tools: TimescaleDB supports anything that speaks SQL, including BI tools like Tableau.
与一般的NoSQL(如 MongoDB, Cassandra),甚至与面向时间的NoSQL(e.g., InfluxDB, KairosDB)相比,TimescaleDB提供了定性和定量的差异:
- 标准SQL:针对时序数据,TimescaleDB提供了标准SQL查询的能力,即使在规模上也是如此。大多数(或者所有)NoSQL数据库需要学习新的查询语言,或者最好使用“SQL-ish”(它仍与现有工具不兼容)。
- 操作简单:使用TimescaleDB,您只需要管理一个数据库来处理关系和时间序列数据。否则,您通常需要将数据存储到两个数据库中:一个是“正常的”关系数据库,另一个是时间序列数据库。
- 可以在关系数据和时序数据之间使用连接(JOINs。
- 针对一组不同的查询来说,TimescaleDB速度更快。在NoSQL数据库中,复杂的查询通常比较缓慢,或者是整表查询,但是一些数据库不支持自然查询。
- 像PostgreSQL一样管理,并继承它对各种数据类型和索引的支持(B-tree, hash, range, BRIN, GiST, GIN)。
- 对地理空间数据的本地支持:TimescaleDB中存储的数据可以利用PostGIS的几何数据类型、索引和查询。
- 第三方工具:TimescaleDB支持任何使用SQL的工具,包括BI工具,例如Tableau。
什么时候使用TimescaleDB?
Then again, if any of the following is true, you might not want to use TimescaleDB:
- Simple read requirements: If you simply want fast key-value lookups or single column rollups, an in-memory or column-oriented database might be more appropriate. The former clearly does not scale to the same data volumes, however, while the latter's performance significantly underperforms for more complex queries.
- Very sparse or unstructured data: While TimescaleDB leverages PostgreSQL support for JSON/JSONB formats and handles sparsity quite efficiently (bitmaps for NULL values), schema-less architectures may be more appropriate in certain scenarios.
- Heavy compression is a priority: Benchmarks show TimescaleDB running on ZFS getting around 4x compression, but compression-optimized column stores might be more appropriate for higher compression rates.
- Infrequent or offline analysis: If slow response times are acceptable (or fast response times limited to a small number of pre-computed metrics), and if you don't expect many applications/users to access that data concurrently, you might avoid using a database at all and instead just store data in an distributed file system.
可以在下列情形中使用TimescaleDB:
- 简单查询需求:如果你只需要快速进行键值查询,或者单列查询,那么使用一个内存数据库或者一个面向列的数据库会更合适。然而,前者显然不能扩展到相同的数据量(扩展性低),后者在负责查询中性能低。
- 稀疏数据或非结构化数据:虽然TimescaleDB可以利用PostgreSQL有效地支持JSON/JSONB格式数据,并能够高效处理稀疏数据(为NULL值的位图),但在某些场景中,无模式的架构可能更合适。