Who am I

Julien RIOU
Open Source DBA
https://julien.riou.xyz
@jriou@hachyderm.io

Disclaimer

Not a Timescale, Inc. employee
No partnership with Timescale, Inc.

Disclaimer

Timescale and TimescaleDB are trademarks of Timescale, Inc.

The Grafana Labs Marks are trademarks of Grafana Labs, and are used with Grafana Labs’ permission. We are not affiliated with, endorsed or sponsored by Grafana Labs or its affiliates.

Who are we

Internal databases

Summary

Basics
Metrics platform
Zabbix
Billing
Version management
Tooling
Conclusion
What’s next

TimescaleDB

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.

https://github.com/timescale/timescaledb

Time series

In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order.

https://en.wikipedia.org/wiki/Time_series

Time series

Graph showing the evolution of TimescaleDB stars on Github

Time series

Date	Stars
2017-08-27	2220
2018-01-24	3330
2018-05-11	4470
…	…
2024-07-30	16950
2024-08-08	16992

Features

Partition management
Compression
Aggregation
Job scheduling

Partitioning

Compression

From

Date	Count
2023-03-30	14670
2023-12-23	15810

Compression

To

Date	Count
[2023-03-30, 2023-12-23]	[14670, 15810]

+

Compression algorithm based on the data type

Aggregation

Materialized view 🐘
- Cache result of a SELECT query into a table
- Result needs to be refreshed
Continuous aggregate 🐯
- Refresh materialized view continuously and incrementally
- Except the most recent data
Real time aggregate 🐯
- Continuous aggregate + most recent data

Job scheduling

TimescaleDB policies
- policy_retention
- policy_compression
- policy_refresh_continuous_aggregate
User-defined actions
- add_job(proc, schedule_interval)

Licenses

Edition	Features	License	Permissions
Community	Advanced chunk operations Compression Continuous aggregates Data retention Jobs Advanced hyperfunctions	Timescale License (TSL)	Internal Use (2.1.a) ✅ Value Added Products or Services (2.1.b) ✅ DBaaS to third parties (2.2) ❌
Apache 2	The rest	Apache 2.0	Open source initiative (OSI) ✅

FOSS alternatives

Feature	Software
Time-series management	pg_timeseries
Partition management	pg_partman
Job scheduling	pg_cron, pg_timetable, cron jobs
Compression	Citus columnar, Hydra columnar

Metrics platform

2021

Infrastructure

Problems

Incidents with data loss or extended downtimes
Too much servers
Limited maintenance skills
Too much languages (with erlenmeyer)
- OpenTSDB, PromQL, InfluxQL, Graphite
- Only compatible with a subpart of the ever changing languages

State of the art

Why

Prometheus is a standard
We know PostgreSQL
Compatible with push mode
No single point of failure

TimescaleDB with Promscale

Proof of concept infrastructure with TimescaleDB and Promscale

Deployment

Disposable proof-of-concept infrastructure
Terraform + Ansible
Open-source 😎 (Apache 2.0)
https://github.com/ovh/timescaledb-benchmark

For example only

Performance

Access nodes are CPU bound
Data nodes are I/O bound

Tunning

For I/O bound workloads

min_wal_size = '64GB'
max_wal_size = '125GB'

Checkpoints slowed down ingestion too much
PostgreSQL restarts caused too long downtimes
Do not raise max_wal_size too much

Disk space

700GB in 24 hours

Disk space used by TimescaleDB in 24 hours

Tunning

Configure enough background workers* to compress chunks

timescaledb.max_background_workers = 48

*at least one per database

compressed_chunks	count
1	18
0	437

Conclusion for the benchmark

Several opened issues during the bench
Not able to ingest our minimal target
Cortex (now Mimir™) was chosen

Aside note

Two years later

Zabbix

The Enterprise-Class Open Source Network Monitoring Solution

Generate and store metrics in PostgreSQL

Table	Number of rows	Size (GB)	Compression	Retention
history	35,789,655,160	452	7d	15d
history_uint	146,317,269,524	1,238	7d	15d
history_log	0	0	7d	15d
history_text	8,644,967,725	196	7d	15d
history_str	13,985	0.2	7d	15d
trends	1,549,631,486	34	7d	180d
trends_uint	6,192,249,880	88	7d	180d

Massive DELETE

The housekeeper (“hk”)

Regular spikes of delete operations

Retention query

Performed by Zabbix

delete from %s where clock < now - n hours;

Enable TimescaleDB

SELECT create_hypertable('history', 'clock', chunk_time_interval => 86400, migrate_data => true);
SELECT create_hypertable('history_uint', 'clock', chunk_time_interval => 86400, migrate_data => true);
SELECT create_hypertable('history_log', 'clock', chunk_time_interval => 86400, migrate_data => true);
SELECT create_hypertable('history_text', 'clock', chunk_time_interval => 86400, migrate_data => true);
SELECT create_hypertable('history_str', 'clock', chunk_time_interval => 86400, migrate_data => true);
SELECT create_hypertable('trends', 'clock', chunk_time_interval => 2592000, migrate_data => true);
SELECT create_hypertable('trends_uint', 'clock', chunk_time_interval => 2592000, migrate_data => true);
BEGIN;
  UPDATE config SET db_extension='timescaledb',hk_history_global=1,hk_trends_global=1;
  UPDATE config SET compression_status=1,compress_older='7d';
COMMIT;

Retention query

Replaced by

select drop_chunks(relation=>'%s',older_than=> now - history_seconds);

Deletes

Fixed

Big databases

⚠️ migrate_data => true ⚠️

Very slow and heavy

Migration plan

Migration

Disk space

Compression

Configure compression with ALTER TABLE
Grant tables owner the LOGIN privilege
Schedule compression jobs
Configure and start background workers

Disk space

Fixed

I/O

Fixed

Versions

Zabbix	TimescaleDB
7.0	2.13.0-2.15.X
6.4	2.1.0-2.15.X
6.0	2.1.0-2.15.X
5.0.10+	1.X, 2.0.1-2.15.X
5.0.0-5.0.9	1.X

Conclusion for Zabbix

Enable TimescaleDB from the beginning
Ensure background workers are started
Fine-tune for i/o bound workload
Install supported versions

Billing

Before cloud computing

Loop over customers
- Loop over services
  - Lookup service price
  - Create billing line
- Create bill

Subscription system

First day of the month

Cloud computing

More elasticity
- Resources can be created and deleted
- Countless times in a month
Pay for what’s really used

Consumption system

Requirements

Store consumption metrics
Aggregate consumption metrics
Write old data points

Timescale logo

Architecture

Billing architecture

Mediation

Store raw events
Compute daily aggregates

Events (hypertable)

18,000,000 rows per day
Chunk interval of 1 day
Compress after 6 months
Drop after 7 months

Daily events (continuous aggregate)

CREATE MATERIALIZED VIEW "daily_events" WITH (timescaledb.continuous) AS
    SELECT
        customer_id, subscription_id, time_bucket('1 day', begin) AS time,
        MIN(duration) AS duration_min, MAX(duration) AS duration_max,
        AVG(duration) AS duration_avg, SUM(duration) AS duration_sum,
        MIN(quantity) AS quantity_min, MAX(quantity) AS quantity_max,
        AVG(quantity) AS quantity_avg, SUM(quantity) AS quantity_sum,
        SUM(1) AS nb_events
    FROM "events"
    WHERE quantity > 0
    GROUP BY time, customer_id, subscription_id
    WITH NO DATA;

Refresh policies

Refresh continuous aggregate
- [1 hour, 7 days]
- Every hour
Refresh continuous aggregate (via UDF)
- [1 hour, 2 months]
- Every day

At first, the continuous aggregation was scheduled to be refreshed every 10 minutes for 2 months of data. We have reduced the frequency and splitted in two refresh policies to reduce the pressure on the database service.

How to implement the UDF:

SELECT
    add_continuous_aggregate_policy(
        'daily_events',
        start_offset => INTERVAL '7 days',
        end_offset => INTERVAL '1 hour',
        schedule_interval => INTERVAL '1 hour'
    );

CREATE PROCEDURE refresh_continuous_aggregate_udf(job_id INT, config jsonb) 
LANGUAGE PLPGSQL AS $$ 

BEGIN
RAISE NOTICE 'Executing job % with config %', job_id, config;
CALL refresh_continuous_aggregate(
    config->>'view'::TEXT,
    NOW()-(config->>'lower_window')::INTERVAL,
    NOW()-(config->>'upper_window')::INTERVAL
);

END $$;

SELECT
    add_job(
        'refresh_continuous_aggregate_udf',
        '1 day',
        config => '{"view": "daily_event", "lower_window": "2 month", "upper_window": "1 hour"}',
        initial_start => '2024-05-01T23:00:00Z'::TIMESTAMP WITH TIME ZONE
    );

Continuous aggregate

Real-time aggregate

ALTER MATERIALIZED VIEW "daily_events"
    SET (timescaledb.materialized_only = false);

Real-time aggregate

Rating

Query “Mediation” aggregation
Create billing lines (“ratings”)
122,000,000 rows in hot table
- Uncompressed
3,400,000,000 rows in archive table
- Compressed after 3 days
Chunk interval of 1 day
Retention of 7 months

Schema migrations

Let’s add a simple “id” column

alter table massive_archive
    add column id bigserial not null;

❌

ERROR: cannot add column with non-constant default expression to a hypertable that has compression
enabled

Try again

Decompress chunks
Add the column
Compress chunks

❌

Disk full

Try again

Create a second hypertable
For each chunk (oldest to newest)
1. Copy rows
2. Compress chunk
Swap hypertables
Drop unused hypertables

Fixed

Compression tips

Do not segment by very high cardinality values
Do not segment by very low cardinality values
~1000 tuples per batchs

Compression example

Time	ID	Subscription	Quantity
2024-09-04 00:00:00Z	1	1	10
2024-09-04 00:00:00Z	2	2	20
2024-09-04 01:00:00Z	3	1	10
2024-09-04 01:00:00Z	4	2	30

Segment by ID

Very high cardinality

Time	ID	Subscription	Quantity
[2024-09-04 00:00:00Z]	[1]	[1]	[10]
[2024-09-04 00:00:00Z]	[2]	[2]	[20]
[2024-09-04 01:00:00Z]	[3]	[1]	[10]
[2024-09-04 01:00:00Z]	[4]	[2]	[30]

Segment by subscription

Natural compression

Time	ID	Subscription	Quantity
[2024-09-04 00:00:00Z,2024-09-04 01:00:00Z]	[1,3]	[1,1]	[10,10]
[2024-09-04 00:00:00Z,2024-09-04 01:00:00Z]	[2,4]	[2,2]	[20,30]

Fixed

Conclusion for Billing

No more load the first day of the month
Easier to predict the amount of next bill

Version management

TimescaleDB is in active development
Lots of performance improvements on continuous aggregates
Less segmentation faults

How to upgrade?

Read the changelog
Update packages on the operating system
Update the extension

psql -X
ALTER EXTENSION timescaledb UPDATE;

Or via the community.postgresql.postgresql_ext Ansible module

Restart if needed

Management of PostgreSQL extension failed: loader version out-of-date
HINT: Please restart the database to upgrade the loader version.

Tooling

check_timescaledb

Nagios probe to check that
- Background workers are started
- Jobs have no error
- Version is up-to-date
Part of the OVHcloud Nagios Plugins
- Open-source 😎 (Apache 2.0)
- https://github.com/ovh/ovh-nagios-plugins

check_timescaledb

What’s next?

Migrate remaining Zabbix deployments
Migrate remaining billing units
Change data capture

Takeways

Prepare for high i/o workloads
Ensure everything is properly configured from the beginning
Moving large live datasets is challenging

Thanks

My colleagues because we are One Team
Timescale, Inc. for providing and maintaining TimescaleDB
You for contributing to the PostgreSQL community

🙏

Questions

❓

Efficient Time Series Management with TimescaleDB at OVHcloud

Who am I

Disclaimer

Disclaimer

Who are we

Internal databases

Summary

TimescaleDB

Time series

Time series

Time series

Features

Partitioning

Compression

Compression

Aggregation

Job scheduling

Licenses

FOSS alternatives

Metrics platform

2021

Infrastructure

Problems

State of the art

State of the art

Why

TimescaleDB with Promscale

Deployment

Performance

Tunning

Disk space

Tunning

Conclusion for the benchmark

Aside note

Zabbix

Massive DELETE

The housekeeper (“hk”)

Retention query

Enable TimescaleDB

Retention query

Deletes

Big databases

Migration plan

Migration

Migration

Migration

Migration

Migration

Migration

Migration

Disk space

Compression

Disk space

I/O

Versions

Conclusion for Zabbix

Billing

Before cloud computing

First day of the month

Cloud computing

Requirements

Architecture

Mediation

Events (hypertable)

Daily events (continuous aggregate)

Refresh policies

Continuous aggregate

Real-time aggregate

Real-time aggregate

Rating

Schema migrations

Try again

Try again

Compression tips

Compression example

Segment by ID

Segment by subscription

Conclusion for Billing

Version management

How to upgrade?