Advanced Analytic Queries

TimescaleDB can be used for a variety of analytical queries, both through its native support for PostgreSQL's full range of SQL functionality, as well as additional functions added to TimescaleDB (both for ease-of-use and for better query optimization).

The following list is just a sample of some of its analytical capabilities.

Median/Percentile

PostgreSQL has inherent methods for determining median values and percentiles namely the function ​percentile_cont​ (​PostgreSQL docs​). An example query for the median temperature is:

SELECT percentile_cont(0.5)
  WITHIN GROUP (ORDER BY temperature)
  FROM conditions;

Cumulative Sum

One way to determine cumulative sum is using the SQL command ​sum(sum(column)) OVER(ORDER BY group)​. For example:

SELECT host, sum(sum(temperature)) OVER(ORDER BY location)
  FROM conditions
  GROUP BY location;

Moving Average

For a simple moving average, you can use the ​OVER​ windowing function over some number of rows, then compute an aggregation function over those rows. The following computes the smoothed temperature of a device by averaging its last 10 readings together:

SELECT time, AVG(temperature) OVER(ORDER BY time
      ROWS BETWEEN 9 PRECEDING AND CURRENT ROW)
    AS smooth_temp
  FROM conditions
  WHERE location = 'garage' and time > NOW() - interval '1 day'
  ORDER BY time DESC;

First, Last

TimescaleDB defines functions for ​first​ and ​last​, which allow you to get the value of one column as ordered by another.

SELECT location, last(temperature, time)
  FROM conditions
  GROUP BY location;

See our ​API docs​ for more details.

Histogram

TimescaleDB also provides a ​histogram​ function. The following example defines a histogram with five buckets defined over the range 60..85. The generated histogram has seven bins where the first is for values below the minimun threshold of 60, the middle five bins are for values in the stated range and the last is for values above 85.

SELECT location, COUNT(*),
    histogram(temperature, 60.0, 85.0, 5)
   FROM conditions
   WHERE time > NOW() - interval '7 days'
   GROUP BY location;

This query will output data in the following form:


 location   | count |        histogram
------------+-------+-------------------------
 office     | 10080 | {0,0,3860,6220,0,0,0}
 basement   | 10080 | {0,6056,4024,0,0,0,0}
 garage     | 10080 | {0,2679,957,2420,2150,1874,0}

What analytic functions are we missing? ​Let us know on github​.

results matching ""

    No results matching ""