Database schema management for lazybones: from chaos to heaven

intro_logo

SCALE 21x

Julien RIOU

March 15, 2024

Speaker

Summary

  1. Basics of schema migrations
  2. Who are we?
  3. Chaos
  4. Schema as code, CI/CD, code reviews
  5. Inventory, automation
  6. Heaven
  7. Known issues
  8. Takeaways

Basics of schema migrations

Schema

Schema changes over time

  • New tables
  • New columns
  • New constraints
  • New everything

Schema migration

Data definition language (DDL)

ALTER TABLE customer
    ADD COLUMN archived_at TIMESTAMP WITH TIME ZONE;

Who are we?

Internal databases

Statistics

  • 7 autonomous infrastructures worldwide
  • 100+ clusters
  • 500+ servers
  • 2000+ databases
  • MySQL and PostgreSQL
  • Highly secure environments

Cluster example

Chaos

Who can ALTER my database?

Startup mindset

  • All employees work in the same openspace
  • A developer comes to the DBA desk
  • Can you do X on my database, please?
  • DBA SSH to the production database host
  • psql database
  • Translate X to queries and run them
  • Job done

The DBA forgot on which host the database is

The DBA misunderstood what developer really wanted

The DBA executed the query in Europe but forgot Canada

The DBA forgot to create a transaction and one query failed in the middle

The DBA started a transaction but forgot to commit

The DBA created a query that runs for too long and locks production

The DBA forgot to use SET ROLE and permissions are now broken

Blame the DBA!

Ticketing

Let’s write migrations in a ticketing system

But

  • Multiple ticketing systems for different teams
  • SQL statement are not well formatted
  • Bad copy pastes
  • Poor reviewing system

Schema as code

Git

  • One repository for one or more databases
    • 534 repositories
  • Well known by developers

sql-migrate

SQL schema migration tool for Go

https://github.com/rubenv/sql-migrate

Developers create code:

v0001.sql

-- +migrate Up
create table x (...);

-- +migrate Down
drop table x;

  • One file = one migration = one transaction
  • Transactions can be disabled with notransaction hint
    • CREATE INDEX CONCURRENTLY
  • Focus on schema (DDL), not data (DML)

DBAs deploy with a command:

sql-migrate-wrapper

  • Handle SET ROLE
  • Two migration paths
    • admin: run as superuser to create extensions
    • normal: run with DDL privileges to create objects
# sql-migrate-wrapper status -config /etc/sqlmigrate/test.json 
INFO[2024-02-13T10:28:47Z] Processing 'test' database                   
INFO[2024-02-13T10:28:47Z] Processing normal migration path             
+-----------+--------------------------------------+
| MIGRATION |               APPLIED                |
+-----------+--------------------------------------+
| v0001.sql | 2022-03-03 15:33:39.709067 +0100 CET |
| v0002.sql | 2022-11-30 12:43:34.947511 +0100 CET |
| v0003.sql | 2022-11-30 12:53:27.437182 +0100 CET |
| v0004.sql | 2023-01-24 17:20:22.330472 +0100 CET |
| v0005.sql | 2023-05-17 12:32:39.87181 +0200 CEST |
+-----------+--------------------------------------+

Code reviews

  • Git based collaboration tool
  • Adopted by the company
  • Not opensource

Features

  • Comments
  • Blocking tasks
  • Review status (approved, need works)
  • Minimum number of required approvals
    • From DBAs
    • From database owners

CI/CD

Terminology

  • Continuous Integration
    • Test locally
    • Ensure software is deliverable
    • Merge changes to the main branch
  • Continuous Delivery
    • Build a release
  • Continous Deployment
    • Build a release and deploy it
    • No human intervention

Tests without data

CDS

CDS is an Enterprise-Grade Continuous Delivery & DevOps Automation Open Source Platform.

https://ovh.github.io/cds/

Implementation

  • Project: all schema tests at the same place
  • Application: one git repository (schema)
  • Pipeline: run sql-migrate on PostgreSQL 14
  • Workflow: link application to pipelines

Web hooks

Tests with data

Deployment

Workflow

Concurrent sessions

lock_timeout (2000)

Abort any statement that waits longer than the specified amount of time while attempting to acquire a lock on a table, index, row, or other database object.

Inventory

Where should I connect to run the migrations?

Custom inventory

In a database of databases:

  • repositories to databases
  • databases to hosts

REST API

$ curl https://<api>/database?name=test
{
  "code": 200,
  "data": [
    {
      "schemarepository": "<uuid>",
      "name": "test",
      "id": "<uuid>",
      "cluster": "<uuid>"
    }
  ]
}

But

  • Every change has to be declared
    • Add a cluster, host, database, git repository, …
    • Remove a cluster, host, database, git repository, …
  • Easy to miss an event
  • 100% home-made

Service discovery

Consul

Consul

Consul service discovery

Consul

  • Nodes
    • name, IP address, meta(data)
  • Services
    • databases
  • Access control list (ACL) with tokens
  • Encryption

Static configuration

  • Node meta
    • server_type
      • postgresql, mysql, filer, …
    • role
      • node, lb, backup, …
    • cluster identifier

Dynamic configuration

  • Node “subrole”
    • primary, replica
  • Database services

Where is my database?

Consul service

What databases are behind the repository?

Special file in the repository meta.json

{
  "databases": {
    "US": [
      "test"
    ]
  }
}

Automation

How to apply migrations safely?

Ansible

ansible

Ansible is a radically simple IT automation system.

https://github.com/ansible/ansible

Concepts

  • Inventory: combinaison of
    • Hosts: remote machine to manage
    • Groups: hosts sharing a common attribute
  • Playbook: list of tasks executed in order, on groups
    • Tasks: module + arguments
      • Modules: smallest unit of code to execute on hosts

Playbook overview

- name: Check arguments
  hosts: all
  run_once: true
  delegate_to: localhost
  tasks:
    - name: Check variable schema_url    # fail fast
    - name: Check variable database_name # fail fast

- name: Update database to the latest schema migration
  hosts: "{{ database_name }}:&subrole_primary"
  tasks:
    - name: Create sql-migrate directories
    - name: Create sql-migrate configuration file
    - name: Clone schema
    - name: Run migrations

Playbook tasks

- name: Create sql-migrate directories
  ansible.builtin.file:
    path: "{{ item }}"
    state: directory
  loop:
    - /etc/sqlmigrate
    - /var/lib/sqlmigrate
- name: Create sql-migrate configuration file
  ansible.builtin.template:
    src: sqlmigrate/database.yml.j2
    dest: "/etc/sqlmigrate/{{ database_name }}.yml"
- name: Clone schema repository
  ansible.builtin.git:
    repo: "{{ schema_url }}"
    dest: "/var/lib/sqlmigrate/{{ database_name }}"
    version: "{{ branch|default('master') }}" # branch or tag
    force: true
  environment:
    TMPDIR: /run
- name: Run migrations
  ansible.builtin.command:
    cmd: sql-migrate-wrapper up -config /etc/sqlmigrate/{{ database_name }}.json

Ansible playbook command

$ ansible-playbook \
    --extra-vars \
        '{"database_name": "test",
          "branch": "master",
          "schema_url": "ssh://git@hostname:port/project/repository.git"
          }' \
    database-primary-schema-update.yml

Ansible + Consul

The Bastion (SSH)

The Bastion

Ansible + The Bastion

Ansible Wrapper

[ssh_connection]
pipelining = True
private_key_file = ~/.ssh/id_ed25519
ssh_executable = /usr/share/ansible/plugins/bastion/sshwrapper.py
sftp_executable = /usr/share/ansible/plugins/bastion/sftpbastion.sh
transfer_method = sftp
retries = 3

https://github.com/ovh/the-bastion-ansible-wrapper

AWX

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible.

https://github.com/ansible/awx

AWX features

  • REST API
  • Web interface
  • CLI
  • Personal accounts (via SSO/SAML)
  • Notifications (alerting, chat)
  • Scheduled jobs
  • Inventory management
  • Source Control management
  • Machine credentials

AWX Web interface

AWX CLI

awx -f human job_templates launch --monitor --extra_vars \
    '{"database_name": "test",
      "branch": "master",
      "schema_url": "ssh://git@hostname:port/project/repository.git"
      }' \
    database-primary-schema-update

Event detection

How to deploy without human intervention?

Producer

Consumer

Current architecture

Migration failures

  • AWX notifications
    • Chat (Webex)
    • On-call alerts (OpsGenie)
  • NRPE probes (Icinga)

In the end

Heaven

Known issues

Highly concurrent repositories

  • Multiple pull requests at the same time
  • Multiple files
  • From various developpers
  • In the same database repository

Same file names

Predicted file names

Solution

Use a declarative model:

Granular deployment

  • Changes are deployed everywhere at the same time
  • Problems may rise directly on production

Solution

Test on real data

  • “DEV” environment is not available everywhere
  • Anonimization is hard
  • This environment will be removed

Solution

schema-consumer

  • 100% home-made
  • Written in Python
  • Add missing features
  • Patch vulnerabilities
  • Not open

Solution

Event-Driven Ansible

  • Free software
  • Same features (at least)
    • Events from Kafka
    • Create jobs on AWX
  • Easy to use
    • Rulebooks in YAML, similar to playbooks
  • Easy to contribute
    • Add Kafka SASL_SSL authentication (#177)
    • add basic auth to controller (#602)

Takeaways

  • SQL is your friend
  • Automation is easy
  • Start small then iterate
  • Learn from your errors

Thank you

Questions

// reveal.js plugins