Efficient Time Series Management with TimescaleDB at OVHcloud

intro_logo

PGDay Lowlands 2024

Julien RIOU

September 13, 2024

Who am I

Disclaimer

  • Not a Timescale, Inc. employee
  • No partnership with Timescale, Inc.

Disclaimer

Timescale and TimescaleDB are trademarks of Timescale, Inc.

The Grafana Labs Marks are trademarks of Grafana Labs, and are used with Grafana Labs’ permission. We are not affiliated with, endorsed or sponsored by Grafana Labs or its affiliates.

Who are we

Internal databases

Summary

  1. Basics
  2. Metrics platform
  3. Zabbix
  4. Billing
  5. Version management
  6. Tooling
  7. Conclusion
  8. What’s next

TimescaleDB

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

https://github.com/timescale/timescaledb

Time series

In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order.

https://en.wikipedia.org/wiki/Time_series

Time series

Graph showing the evolution of TimescaleDB stars on Github

Time series

Date Stars
2017-08-27 2220
2018-01-24 3330
2018-05-11 4470
2024-07-30 16950
2024-08-08 16992

Features

  • Partition management
  • Compression
  • Aggregation
  • Job scheduling

Partitioning

Compression

From

Date Count
2023-03-30 14670
2023-12-23 15810

Compression

To

Date Count
[2023-03-30, 2023-12-23] [14670, 15810]

+

Compression algorithm based on the data type

Aggregation

  • Materialized view 🐘
    • Cache result of a SELECT query into a table
    • Result needs to be refreshed
  • Continuous aggregate 🐯
    • Refresh materialized view continuously and incrementally
    • Except the most recent data
  • Real time aggregate 🐯
    • Continuous aggregate + most recent data

Job scheduling

  • TimescaleDB policies
    • policy_retention
    • policy_compression
    • policy_refresh_continuous_aggregate
  • User-defined actions
    • add_job(proc, schedule_interval)

Licenses

Edition Features License Permissions
Community Advanced chunk operations
Compression
Continuous aggregates
Data retention
Jobs
Advanced hyperfunctions
Timescale License (TSL) Internal Use (2.1.a) ✅
Value Added Products or Services (2.1.b) ✅
DBaaS to third parties (2.2) ❌
Apache 2 The rest Apache 2.0 Open source initiative (OSI) ✅

FOSS alternatives

Feature Software
Time-series management pg_timeseries
Partition management pg_partman
Job scheduling pg_cron, pg_timetable, cron jobs
Compression Citus columnar, Hydra columnar

Metrics platform

2021

Infrastructure

Problems

  • Incidents with data loss or extended downtimes
  • Too much servers
  • Limited maintenance skills
  • Too much languages (with erlenmeyer)
    • OpenTSDB, PromQL, InfluxQL, Graphite
    • Only compatible with a subpart of the ever changing languages

State of the art

State of the art

State of the art

State of the art

Why

  • Prometheus is a standard
  • We know PostgreSQL
  • Compatible with push mode
  • No single point of failure

TimescaleDB with Promscale

Proof of concept infrastructure with TimescaleDB and Promscale

Deployment

For example only

Performance

  • Access nodes are CPU bound
  • Data nodes are I/O bound

Tunning

For I/O bound workloads

min_wal_size = '64GB'
max_wal_size = '125GB'

  • Checkpoints slowed down ingestion too much
  • PostgreSQL restarts caused too long downtimes
  • Do not raise max_wal_size too much

Disk space

700GB in 24 hours

Disk space used by TimescaleDB in 24 hours

Tunning

Configure enough background workers* to compress chunks

timescaledb.max_background_workers = 48

*at least one per database

compressed_chunks count
1 18
0 437

Conclusion for the benchmark

  • Several opened issues during the bench
  • Not able to ingest our minimal target
  • Cortex (now Mimir™) was chosen

Aside note

Two years later

Zabbix

The Enterprise-Class Open Source Network Monitoring Solution

Generate and store metrics in PostgreSQL

Table Number of rows Size (GB) Compression Retention
history 35,789,655,160 452 7d 15d
history_uint 146,317,269,524 1,238 7d 15d
history_log 0 0 7d 15d
history_text 8,644,967,725 196 7d 15d
history_str 13,985 0.2 7d 15d
trends 1,549,631,486 34 7d 180d
trends_uint 6,192,249,880 88 7d 180d

Massive DELETE

The housekeeper (“hk”)

Regular spikes of delete operations

Retention query

Performed by Zabbix

delete from %s where clock < now - n hours;

Enable TimescaleDB

SELECT create_hypertable('history', 'clock', chunk_time_interval => 86400, migrate_data => true);
SELECT create_hypertable('history_uint', 'clock', chunk_time_interval => 86400, migrate_data => true);
SELECT create_hypertable('history_log', 'clock', chunk_time_interval => 86400, migrate_data => true);
SELECT create_hypertable('history_text', 'clock', chunk_time_interval => 86400, migrate_data => true);
SELECT create_hypertable('history_str', 'clock', chunk_time_interval => 86400, migrate_data => true);
SELECT create_hypertable('trends', 'clock', chunk_time_interval => 2592000, migrate_data => true);
SELECT create_hypertable('trends_uint', 'clock', chunk_time_interval => 2592000, migrate_data => true);
BEGIN;
  UPDATE config SET db_extension='timescaledb',hk_history_global=1,hk_trends_global=1;
  UPDATE config SET compression_status=1,compress_older='7d';
COMMIT;

Retention query

Replaced by

select drop_chunks(relation=>'%s',older_than=> now - history_seconds);

Deletes

Fixed

Big databases

⚠️ migrate_data => true ⚠️

Very slow and heavy

Migration plan

Migration

Migration

Migration

Migration

Migration

Migration

Migration

Disk space

Compression

  • Configure compression with ALTER TABLE
  • Grant tables owner the LOGIN privilege
  • Schedule compression jobs
  • Configure and start background workers

Disk space

Fixed

I/O

Fixed

Versions

Zabbix TimescaleDB
7.0 2.13.0-2.15.X
6.4 2.1.0-2.15.X
6.0 2.1.0-2.15.X
5.0.10+ 1.X, 2.0.1-2.15.X
5.0.0-5.0.9 1.X

Conclusion for Zabbix

  • Enable TimescaleDB from the beginning
  • Ensure background workers are started
  • Fine-tune for i/o bound workload
  • Install supported versions

Billing

Before cloud computing

  • Loop over customers
    • Loop over services
      • Lookup service price
      • Create billing line
    • Create bill

Subscription system

First day of the month

Cloud computing

  • More elasticity
    • Resources can be created and deleted
    • Countless times in a month
  • Pay for what’s really used

Consumption system

Requirements

  • Store consumption metrics
  • Aggregate consumption metrics
  • Write old data points

Timescale logo

Architecture

Billing architecture

Mediation

  • Store raw events
  • Compute daily aggregates

Events (hypertable)

  • 18,000,000 rows per day
  • Chunk interval of 1 day
  • Compress after 6 months
  • Drop after 7 months

Daily events (continuous aggregate)

CREATE MATERIALIZED VIEW "daily_events" WITH (timescaledb.continuous) AS
    SELECT
        customer_id, subscription_id, time_bucket('1 day', begin) AS time,
        MIN(duration) AS duration_min, MAX(duration) AS duration_max,
        AVG(duration) AS duration_avg, SUM(duration) AS duration_sum,
        MIN(quantity) AS quantity_min, MAX(quantity) AS quantity_max,
        AVG(quantity) AS quantity_avg, SUM(quantity) AS quantity_sum,
        SUM(1) AS nb_events
    FROM "events"
    WHERE quantity > 0
    GROUP BY time, customer_id, subscription_id
    WITH NO DATA;

Refresh policies

  1. Refresh continuous aggregate
    • [1 hour, 7 days]
    • Every hour
  2. Refresh continuous aggregate (via UDF)
    • [1 hour, 2 months]
    • Every day

Continuous aggregate

Continuous aggregate

Real-time aggregate

ALTER MATERIALIZED VIEW "daily_events"
    SET (timescaledb.materialized_only = false);

Real-time aggregate

Real-time aggregate

Rating

  • Query “Mediation” aggregation
  • Create billing lines (“ratings”)
  • 122,000,000 rows in hot table
    • Uncompressed
  • 3,400,000,000 rows in archive table
    • Compressed after 3 days
  • Chunk interval of 1 day
  • Retention of 7 months

Schema migrations

Let’s add a simple “id” column

alter table massive_archive
    add column id bigserial not null;

ERROR: cannot add column with non-constant default expression to a hypertable that has compression
enabled

Try again

  1. Decompress chunks
  2. Add the column
  3. Compress chunks

Disk full

Try again

  1. Create a second hypertable
  2. For each chunk (oldest to newest)
    1. Copy rows
    2. Compress chunk
  3. Swap hypertables
  4. Drop unused hypertables

Fixed

Compression tips

  • Do not segment by very high cardinality values
  • Do not segment by very low cardinality values
  • ~1000 tuples per batchs

Compression example

Time ID Subscription Quantity
2024-09-04 00:00:00Z 1 1 10
2024-09-04 00:00:00Z 2 2 20
2024-09-04 01:00:00Z 3 1 10
2024-09-04 01:00:00Z 4 2 30

Segment by ID

Very high cardinality

Time ID Subscription Quantity
[2024-09-04 00:00:00Z] [1] [1] [10]
[2024-09-04 00:00:00Z] [2] [2] [20]
[2024-09-04 01:00:00Z] [3] [1] [10]
[2024-09-04 01:00:00Z] [4] [2] [30]

Segment by subscription

Natural compression

Time ID Subscription Quantity
[2024-09-04 00:00:00Z,2024-09-04 01:00:00Z] [1,3] [1,1] [10,10]
[2024-09-04 00:00:00Z,2024-09-04 01:00:00Z] [2,4] [2,2] [20,30]

Fixed

Conclusion for Billing

  • No more load the first day of the month
  • Easier to predict the amount of next bill

Version management

  • TimescaleDB is in active development
  • Lots of performance improvements on continuous aggregates
  • Less segmentation faults

How to upgrade?

  1. Read the changelog
  2. Update packages on the operating system
  3. Update the extension
psql -X
ALTER EXTENSION timescaledb UPDATE;

Or via the community.postgresql.postgresql_ext Ansible module

Restart if needed

Management of PostgreSQL extension failed: loader version out-of-date
HINT: Please restart the database to upgrade the loader version.

Tooling

check_timescaledb

  • Nagios probe to check that
    • Background workers are started
    • Jobs have no error
    • Version is up-to-date
  • Part of the OVHcloud Nagios Plugins

check_timescaledb

What’s next?

  • Migrate remaining Zabbix deployments
  • Migrate remaining billing units
  • Change data capture

Takeways

  • Prepare for high i/o workloads
  • Ensure everything is properly configured from the beginning
  • Moving large live datasets is challenging

Thanks

  • My colleagues because we are One Team
  • Timescale, Inc. for providing and maintaining TimescaleDB
  • You for contributing to the PostgreSQL community

🙏

Questions

// reveal.js plugins