Database schema management for lazybones: from chaos to heaven

SCALE 21x

Julien RIOU

March 15, 2024



Speaker


Summary

  1. Basics of schema migrations
  2. Who are we?
  3. Chaos
  4. Schema as code, CI/CD, code reviews
  5. Inventory, automation
  6. Heaven
  7. Known issues
  8. Takeaways

Basics of schema migrations


Schema

https://github.com/devrimgunduz/pagila


Schema changes over time


Schema migration

Data definition language (DDL)

ALTER TABLE customer
    ADD COLUMN archived_at TIMESTAMP WITH TIME ZONE;

Who are we?


Internal databases


Statistics


Cluster example


Chaos


Who can ALTER my database?


Startup mindset


The DBA forgot on which host the database is


The DBA misunderstood what developer really wanted


The DBA executed the query in Europe but forgot Canada


The DBA forgot to create a transaction and one query failed in the middle


The DBA started a transaction but forgot to commit


The DBA created a query that runs for too long and locks production


The DBA forgot to use SET ROLE and permissions are now broken


Blame the DBA!

Link to image


Ticketing

Let’s write migrations in a ticketing system


OTRS


Jira


But

  • Multiple ticketing systems for different teams
  • SQL statement are not well formatted
  • Bad copy pastes
  • Poor reviewing system


Schema as code


Git


sql-migrate

SQL schema migration tool for Go

https://github.com/rubenv/sql-migrate


Developers create code:

v0001.sql

-- +migrate Up
create table x (...);

-- +migrate Down
drop table x;

Link to image generator




DBAs deploy with a command:

sql-migrate-wrapper


# sql-migrate-wrapper status -config /etc/sqlmigrate/test.json 
INFO[2024-02-13T10:28:47Z] Processing 'test' database                   
INFO[2024-02-13T10:28:47Z] Processing normal migration path             
+-----------+--------------------------------------+
| MIGRATION |               APPLIED                |
+-----------+--------------------------------------+
| v0001.sql | 2022-03-03 15:33:39.709067 +0100 CET |
| v0002.sql | 2022-11-30 12:43:34.947511 +0100 CET |
| v0003.sql | 2022-11-30 12:53:27.437182 +0100 CET |
| v0004.sql | 2023-01-24 17:20:22.330472 +0100 CET |
| v0005.sql | 2023-05-17 12:32:39.87181 +0200 CEST |
+-----------+--------------------------------------+


Code reviews


Features




CI/CD


Terminology


Tests without data


CDS

CDS is an Enterprise-Grade Continuous Delivery & DevOps Automation Open Source Platform.

https://ovh.github.io/cds/


Implementation


Web hooks


CDS workflow


CDS pipeline




Tests with data

Backups are based on ZFS snapshots performed and sent everyday to an anonymised dev environment:


Deployment


Workflow


Concurrent sessions

lock_timeout (2000)

Abort any statement that waits longer than the specified amount of time while attempting to acquire a lock on a table, index, row, or other database object.



Inventory

Where should I connect to run the migrations?


Custom inventory

In a database of databases:



REST API

$ curl https://<api>/database?name=test
{
  "code": 200,
  "data": [
    {
      "schemarepository": "<uuid>",
      "name": "test",
      "id": "<uuid>",
      "cluster": "<uuid>"
    }
  ]
}

But

  • Every change has to be declared
    • Add a cluster, host, database, git repository, …
    • Remove a cluster, host, database, git repository, …
  • Easy to miss an event
  • 100% home-made

Close icons created by hqrloveq - Flaticon


Service discovery

Consul


Consul

Consul service discovery


Consul


Static configuration


Dynamic configuration


Where is my database?

Consul service


What databases are behind the repository?

Special file in the repository meta.json

{
  "databases": {
    "US": [
      "test"
    ]
  }
}


Automation

How to apply migrations safely?


Ansible

ansible

Ansible is a radically simple IT automation system.

https://github.com/ansible/ansible


Concepts



Playbook overview

- name: Check arguments
  hosts: all
  run_once: true
  delegate_to: localhost
  tasks:
    - name: Check variable schema_url    # fail fast
    - name: Check variable database_name # fail fast

- name: Update database to the latest schema migration
  hosts: "{{ database_name }}:&subrole_primary"
  tasks:
    - name: Create sql-migrate directories
    - name: Create sql-migrate configuration file
    - name: Clone schema
    - name: Run migrations

Playbook tasks

- name: Create sql-migrate directories
  ansible.builtin.file:
    path: "{{ item }}"
    state: directory
  loop:
    - /etc/sqlmigrate
    - /var/lib/sqlmigrate

- name: Create sql-migrate configuration file
  ansible.builtin.template:
    src: sqlmigrate/database.yml.j2
    dest: "/etc/sqlmigrate/{{ database_name }}.yml"

- name: Clone schema repository
  ansible.builtin.git:
    repo: "{{ schema_url }}"
    dest: "/var/lib/sqlmigrate/{{ database_name }}"
    version: "{{ branch|default('master') }}" # branch or tag
    force: true
  environment:
    TMPDIR: /run

- name: Run migrations
  ansible.builtin.command:
    cmd: sql-migrate-wrapper up -config /etc/sqlmigrate/{{ database_name }}.json

Ansible playbook command

$ ansible-playbook \
    --extra-vars \
        '{"database_name": "test",
          "branch": "master",
          "schema_url": "ssh://git@hostname:port/project/repository.git"
          }' \
    database-primary-schema-update.yml

Ansible + Consul


The Bastion (SSH)

The Bastion

Ansible + The Bastion

Ansible Wrapper

[ssh_connection]
pipelining = True
private_key_file = ~/.ssh/id_ed25519
ssh_executable = /usr/share/ansible/plugins/bastion/sshwrapper.py
sftp_executable = /usr/share/ansible/plugins/bastion/sftpbastion.sh
transfer_method = sftp
retries = 3

https://github.com/ovh/the-bastion-ansible-wrapper

SCP is deprecated, use SFTP instead.


AWX

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible.

https://github.com/ansible/awx


AWX features


AWX Web interface



AWX CLI

awx -f human job_templates launch --monitor --extra_vars \
    '{"database_name": "test",
      "branch": "master",
      "schema_url": "ssh://git@hostname:port/project/repository.git"
      }' \
    database-primary-schema-update


Event detection

How to deploy without human intervention?


Producer


Consumer

Consumer is 100% home-made today. We plan to replace this block by Event-Driven Ansible.


Current architecture




Migration failures


AWX notification on OpsGenie


NRPE probe notification on OpsGenie


In the end



Heaven


Known issues


Highly concurrent repositories



Same file names




Predicted file names






Solution

Use a declarative model:

If anybody has a feedback on those tools or another, please feel free to contact me.


Granular deployment


Solution

When merge? Before or after deployment to non-critical environments? What git branches? What orchestration around those git branches? How to rollback?


Test on real data


Solution

Test schema migrations on the backup host directly:

  • clone the latest ZFS dataset
  • create a PostgreSQL instance on this clone
  • run migration
  • cleanup

schema-consumer


Solution

Event-Driven Ansible



Takeaways


Thank you


Questions