ssube/prometheus-sql-adapter

View on GitHub
docs/lid-sha1.md

Summary

Maintainability
Test Coverage
# Sizes

## Row Size

Samples were taken from an 8 node, 90 pod cluster being monitored by a single Prometheus replica, for roughly an
8-hour period of low overnight usage. Approximately 2.13 million samples were collected per hourly chunk, or 18.13
million total.

Sample rows weighed an average of `262.55` bytes after chunking and indexing, but without compression. The `lid`
column is a consistent 29 bytes, while the `name` varies between 38 and 69 bytes.

These samples had 32793 unique label sets (`lid`s), of which 54.71% were still active by the end. Label rows weighed
an average of `2686.95` bytes after indexing (BTREE for the `lid`, GIN for the `labels`), or 84MB total.

Removing labels from the samples table and deduplicating them yielded a 59.75% reduction in total disk space.

Enabling compression on the `metric_samples` table for chunks older than 6 hours, using `orderby = 'time'` and
`segmentby = 'lid'`, brought each chunk from 530MB to below 10MB:

```sql
SELECT
  hypertable_name,
  chunk_name,
  compressed_total_bytes,
  uncompressed_total_bytes
FROM timescaledb_information.compressed_chunk_stats
WHERE compression_status = 'Compressed';

 hypertable_name |               chunk_name                | compressed_total_bytes | uncompressed_total_bytes 
-----------------+-----------------------------------------+------------------------+--------------------------
 metric_samples  | _timescaledb_internal._hyper_1_54_chunk | 9680 kB                | 530 MB
 metric_samples  | _timescaledb_internal._hyper_1_57_chunk | 9712 kB                | 531 MB
 metric_samples  | _timescaledb_internal._hyper_1_58_chunk | 9632 kB                | 530 MB
 metric_samples  | _timescaledb_internal._hyper_1_59_chunk | 9744 kB                | 529 MB
 metric_samples  | _timescaledb_internal._hyper_1_60_chunk | 9616 kB                | 530 MB
(5 rows)

SELECT
  hypertable_name,
  chunk_name,
  pg_size_bytes(uncompressed_total_bytes)::float / pg_size_bytes(compressed_total_bytes) AS compression_ratio
FROM timescaledb_information.compressed_chunk_stats
WHERE compression_status = 'Compressed';

 hypertable_name |               chunk_name                | compression_ratio 
-----------------+-----------------------------------------+-------------------
 metric_samples  | _timescaledb_internal._hyper_1_54_chunk |  0.98216391509434
 metric_samples  | _timescaledb_internal._hyper_1_57_chunk | 0.982138653483992
 metric_samples  | _timescaledb_internal._hyper_1_58_chunk | 0.982252358490566
 metric_samples  | _timescaledb_internal._hyper_1_59_chunk | 0.982012051039698
 metric_samples  | _timescaledb_internal._hyper_1_60_chunk | 0.982281839622641
(5 rows)
```

Compression yielded a 98.21% reduction in total disk space. The `GROUP BY lid ORDER BY time DESC` causes each
timeseries, or unique set of labels, to be grouped together and time order often allows delta compression.
For short enough chunks, this results in a small number of samples: less than 2000 per timeseries
in the test cluster. The compressed chunk usually had one row per active timeseries, indicating that most or
all samples could be compressed together (the `lid` and thus the `name` are constant and can be stored once).

When the last 6 hours are considered live for a 24 hour retention period for full resolution samples, the final
reduction in disk space was 84.14% compared to the previous schema. This becomes more effective as the ratio of
active to compressed chunks decreases, with a 94.00% reduction for a 72 hour retention period and upwards of 97%
for a 10 day retention period.

## Further Research

- Removing the `name` from `metric_samples` would reduce row size further and make each row an identical 45 bytes
  (before indexing), with `lid` replacing it in the `(time, name, time)` index. This would, however, require the
  query planner to hit `metric_labels` first, then `metric_samples` using the `(lid, time)` within each chunk
  (Timescale itself plans the first `time` to a set of chunks).

- Putting `metric_labels` into a hypertable would allow Timescale to drop or compress old labels automatically,
  but each insert could bump a row from an old chunk into a new one, which seems like a performance problem and
  would make it difficult to compress chunks. Most `lid`s remain inactive when they have not been used in a few
  minutes, when their pod or job completes, but there are no guarantees around that.

- Using a TTL cache rather than LRU (or 2Q) would allow the adapter to drop old label sets after some time. The
  maximum size provided to the LRU cache prevents labels from accumulating without limit over time, but does
  nothing to evict labels that have not been seen recently.