Speaker

Julien RIOU
Open Source DBA
https://julien.riou.xyz
@jriou@hachyderm.io

Summary

Basics of schema migrations
Who are we?
Chaos
Schema as code, CI/CD, code reviews
Inventory, automation
Heaven
Known issues
Takeaways

Basics of schema migrations

Schema

Schema changes over time

New tables
New columns
New constraints
New everything

Schema migration

Data definition language (DDL)

ALTER TABLE customer
    ADD COLUMN archived_at TIMESTAMP WITH TIME ZONE;

Who are we?

Internal databases

Statistics

7 autonomous infrastructures worldwide
100+ clusters
500+ servers
2000+ databases
MySQL and PostgreSQL
Highly secure environments

Cluster example

Chaos

Who can ALTER my database?

Startup mindset

All employees work in the same openspace
A developer comes to the DBA desk
Can you do X on my database, please?
DBA SSH to the production database host
psql database
Translate X to queries and run them
Job done

The DBA forgot on which host the database is

The DBA misunderstood what developer really wanted

The DBA executed the query in Europe but forgot Canada

The DBA forgot to create a transaction and one query failed in the middle

The DBA started a transaction but forgot to commit

The DBA created a query that runs for too long and locks production

The DBA forgot to use SET ROLE and permissions are now broken

Blame the DBA!

Ticketing

Let’s write migrations in a ticketing system

But

Multiple ticketing systems for different teams
SQL statement are not well formatted
Bad copy pastes
Poor reviewing system

Schema as code

Git

One repository for one or more databases
- 534 repositories
Well known by developers

sql-migrate

SQL schema migration tool for Go

https://github.com/rubenv/sql-migrate

Developers create code:

v0001.sql

-- +migrate Up
create table x (...);

-- +migrate Down
drop table x;

One file = one migration = one transaction
Transactions can be disabled with notransaction hint
- CREATE INDEX CONCURRENTLY
Focus on schema (DDL), not data (DML)

DBAs deploy with a command:

sql-migrate-wrapper

Handle SET ROLE
Two migration paths
- admin: run as superuser to create extensions
- normal: run with DDL privileges to create objects

# sql-migrate-wrapper status -config /etc/sqlmigrate/test.json 
INFO[2024-02-13T10:28:47Z] Processing 'test' database                   
INFO[2024-02-13T10:28:47Z] Processing normal migration path             
+-----------+--------------------------------------+
| MIGRATION |               APPLIED                |
+-----------+--------------------------------------+
| v0001.sql | 2022-03-03 15:33:39.709067 +0100 CET |
| v0002.sql | 2022-11-30 12:43:34.947511 +0100 CET |
| v0003.sql | 2022-11-30 12:53:27.437182 +0100 CET |
| v0004.sql | 2023-01-24 17:20:22.330472 +0100 CET |
| v0005.sql | 2023-05-17 12:32:39.87181 +0200 CEST |
+-----------+--------------------------------------+

Code reviews

Git based collaboration tool
Adopted by the company
Not opensource

Features

Comments
Blocking tasks
Review status (approved, need works)
Minimum number of required approvals
- From DBAs
- From database owners

CI/CD

Terminology

Continuous Integration
- Test locally
- Ensure software is deliverable
- Merge changes to the main branch
Continuous Delivery
- Build a release
Continous Deployment
- Build a release and deploy it
- No human intervention

Tests without data

CDS

CDS is an Enterprise-Grade Continuous Delivery & DevOps Automation Open Source Platform.

https://ovh.github.io/cds/

Implementation

Project: all schema tests at the same place
Application: one git repository (schema)
Pipeline: run sql-migrate on PostgreSQL 14
Workflow: link application to pipelines

Web hooks

Tests with data

Deployment

Workflow

Concurrent sessions

lock_timeout (2000)

Abort any statement that waits longer than the specified amount of time while attempting to acquire a lock on a table, index, row, or other database object.

Inventory

Where should I connect to run the migrations?

Custom inventory

In a database of databases:

repositories to databases
databases to hosts

REST API

$ curl https://<api>/database?name=test
{
  "code": 200,
  "data": [
    {
      "schemarepository": "<uuid>",
      "name": "test",
      "id": "<uuid>",
      "cluster": "<uuid>"
    }
  ]
}

But

Every change has to be declared
- Add a cluster, host, database, git repository, …
- Remove a cluster, host, database, git repository, …
Easy to miss an event
100% home-made

Service discovery

Consul

Distributed and highly-available data store
Local agent
https://github.com/hashicorp/consul

Consul

Consul service discovery

Consul

Nodes
- name, IP address, meta(data)
Services
- databases
Access control list (ACL) with tokens
Encryption

Static configuration

Node meta
- server_type
  - postgresql, mysql, filer, …
- role
  - node, lb, backup, …
- cluster identifier

Dynamic configuration

Node “subrole”
- primary, replica
Database services

Where is my database?

Consul service

What databases are behind the repository?

Special file in the repository meta.json

{
  "databases": {
    "US": [
      "test"
    ]
  }
}

Automation

How to apply migrations safely?

Ansible

ansible

Ansible is a radically simple IT automation system.

https://github.com/ansible/ansible

Concepts

Inventory: combinaison of
- Hosts: remote machine to manage
- Groups: hosts sharing a common attribute
Playbook: list of tasks executed in order, on groups
- Tasks: module + arguments
  - Modules: smallest unit of code to execute on hosts

Playbook overview

- name: Check arguments
  hosts: all
  run_once: true
  delegate_to: localhost
  tasks:
    - name: Check variable schema_url    # fail fast
    - name: Check variable database_name # fail fast

- name: Update database to the latest schema migration
  hosts: "{{ database_name }}:&subrole_primary"
  tasks:
    - name: Create sql-migrate directories
    - name: Create sql-migrate configuration file
    - name: Clone schema
    - name: Run migrations

Playbook tasks

- name: Create sql-migrate directories
  ansible.builtin.file:
    path: "{{ item }}"
    state: directory
  loop:
    - /etc/sqlmigrate
    - /var/lib/sqlmigrate

- name: Create sql-migrate configuration file
  ansible.builtin.template:
    src: sqlmigrate/database.yml.j2
    dest: "/etc/sqlmigrate/{{ database_name }}.yml"

- name: Clone schema repository
  ansible.builtin.git:
    repo: "{{ schema_url }}"
    dest: "/var/lib/sqlmigrate/{{ database_name }}"
    version: "{{ branch|default('master') }}" # branch or tag
    force: true
  environment:
    TMPDIR: /run

- name: Run migrations
  ansible.builtin.command:
    cmd: sql-migrate-wrapper up -config /etc/sqlmigrate/{{ database_name }}.json

Ansible playbook command

$ ansible-playbook \
    --extra-vars \
        '{"database_name": "test",
          "branch": "master",
          "schema_url": "ssh://git@hostname:port/project/repository.git"
          }' \
    database-primary-schema-update.yml

Ansible + Consul

Dynamic inventory
https://github.com/wilfriedroset/consul-awx

The Bastion (SSH)

Fine-grained access to infrastructure
Sessions can be recorded
Used in secure environment
Already used by humans
https://github.com/ovh/the-bastion

Ansible + The Bastion

Ansible Wrapper

[ssh_connection]
pipelining = True
private_key_file = ~/.ssh/id_ed25519
ssh_executable = /usr/share/ansible/plugins/bastion/sshwrapper.py
sftp_executable = /usr/share/ansible/plugins/bastion/sftpbastion.sh
transfer_method = sftp
retries = 3

https://github.com/ovh/the-bastion-ansible-wrapper

AWX

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible.

https://github.com/ansible/awx

AWX features

REST API
Web interface
CLI
Personal accounts (via SSO/SAML)
Notifications (alerting, chat)
Scheduled jobs
Inventory management
Source Control management
Machine credentials

AWX Web interface

AWX CLI

awx -f human job_templates launch --monitor --extra_vars \
    '{"database_name": "test",
      "branch": "master",
      "schema_url": "ssh://git@hostname:port/project/repository.git"
      }' \
    database-primary-schema-update

Event detection

How to deploy without human intervention?

Producer

Consumer

Current architecture

Migration failures

AWX notifications
- Chat (Webex)
- On-call alerts (OpsGenie)
NRPE probes (Icinga)

In the end

Heaven

Known issues

Highly concurrent repositories

Multiple pull requests at the same time
Multiple files
From various developpers
In the same database repository

Same file names

Predicted file names

Solution

Use a declarative model:

atlas (MySQL, PostgreSQL)
skeema (MySQL only)

Granular deployment

Changes are deployed everywhere at the same time
Problems may rise directly on production

Solution

Test on real data

“DEV” environment is not available everywhere
Anonimization is hard
This environment will be removed

Solution

schema-consumer

100% home-made
Written in Python
Add missing features
Patch vulnerabilities
Not open

Solution

Event-Driven Ansible

Free software
Same features (at least)
- Events from Kafka
- Create jobs on AWX
Easy to use
- Rulebooks in YAML, similar to playbooks
Easy to contribute
- Add Kafka SASL_SSL authentication (#177)
- add basic auth to controller (#602)

Takeaways

SQL is your friend
Automation is easy
Start small then iterate
Learn from your errors

Database schema management for lazybones: from chaos to heaven

Speaker

Summary

Basics of schema migrations

Schema

Schema changes over time

Schema migration

Who are we?

Internal databases

Statistics

Cluster example

Chaos

Who can ALTER my database?

Startup mindset

Blame the DBA!

Ticketing

But

Schema as code

Git

sql-migrate

Code reviews

Features

CI/CD

Terminology

Tests without data

Implementation

Web hooks

Tests with data

Deployment

Workflow

Concurrent sessions

Inventory

Custom inventory

REST API

But

Service discovery

Consul

Consul

Static configuration

Dynamic configuration

Where is my database?

What databases are behind the repository?

Automation

Ansible

Concepts

Playbook overview

Playbook tasks

Ansible playbook command

Ansible + Consul

The Bastion (SSH)

Ansible + The Bastion

AWX

AWX features

AWX Web interface

AWX CLI

Event detection

Producer

Consumer

Current architecture

Migration failures

In the end

Heaven

Known issues

Highly concurrent repositories

Same file names

Predicted file names

Solution

Granular deployment

Solution

Test on real data

Solution

schema-consumer

Solution

Takeaways

Thank you

Questions