Automating Internal Databases Operations at OVHcloud with Ansible

intro_logo

CfgMgmtCamp 2024

Julien RIOU

February 6, 2024

Speaker

Summary

  • Who are we?
  • Managed infrastructure
  • Management tools
  • Ansible code base
  • Real world examples
  • Implementation
  • Development
  • What’s next?

Who are we?

  • Major cloud provider in Europe
  • Datacenters worldwide
  • Baremetal servers, public & private cloud, managed services

Managed infrastructure

  • 3 DBMS (MySQL, MongoDB, PostgreSQL)
  • 7 autonomous infrastructures worldwide
  • 500+ servers
  • 2000+ databases
  • 100+ clusters
  • Highly secure environments

Cluster example

Mutualized environment

Management tools

Infrastructure as Code

Terraform logo

  • Manage infrastructure lifecycle
    • Create, replace, destroy
    • Scale up, down
  • Providers: OVH, vSphere, phpipam, AWS
  • Use standard providers first

Configuration management

puppet

  • Manage operating system security hardening
  • Install and configure packages (including DBMS)
  • Agent run manually on internal databases

One-shot operations

ansible

  • Requests from users
  • Maintenances
  • Orchestration of multiple tasks
  • Acting on external resources

Operation examples

  • Bootstrap clusters
  • Create/move/delete databases, users, permissions
  • Test/apply schema migrations
  • Minor/major upgrades
  • Reboot and decrypt servers, clusters
  • Daily restores

Automation

  • Reduce human errors
  • Free human time and energy
  • Focus on what’s important

Deep dive into Ansible

Code base

Architecture of a playbook

  • Playbook
    • Play
      • include task
      • include task
    • Play
      • include task

Reusable tasks

  • No role, only tasks
  • Located under tasks directory
  • One task = one module
  • Tasks can be included by one or more playbooks
  • Naming convention is scope-action.yml
  • Idempotence

Real-world examples

  • Schema migrations
  • Database creation
  • Minor upgrades
  • Databases migrations

Schema migrations

  • Applications evolve all the time
  • Databases schemas too
  • Reviewed and applied by DBAs

Schema migrations

sql-migrate

-- +migrate Up
create table author (
    id   bigserial primary key,
    name text not null
);

create table talk (
    id        bigserial primary key,
    title     text not null,
    author_id bigint not null references author(id)
);

-- +migrate Down
drop table author, talk;

Schema migrations

  • Move forward with sql-migrate up
  • Rollback with sql-migrate down

Playbook overview

- name: check arguments
  hosts: all
  run_once: true
  delegate_to: localhost
  tasks:
    - name: check variable schema_url    # fail fast
    - name: check variable database_name # fail fast
- name: update database to the latest schema migration
  hosts: "{{ database_name }}:&subrole_primary"
  tasks:
    - name: create sql-migrate directories
    - name: create sql-migrate configuration file
    - name: clone schema
    - name: run migrations

Playbook tasks

- name: create sql-migrate directories
  ansible.builtin.file:
    path: "{{ item }}"
    state: directory
  loop:
    - /etc/sqlmigrate
    - /var/lib/sqlmigrate
- name: create sql-migrate configuration file
  ansible.builtin.template:
    src: sqlmigrate/database.yml.j2
    dest: "/etc/sqlmigrate/{{ database_name }}.yml"

Playbook tasks

- name: clone schema repository
  ansible.builtin.git:
    repo: "{{ schema_url }}"
    dest: "/var/lib/sqlmigrate/{{ database_name }}"
    version: "{{ branch|default('master') }}" # branch or tag
    force: true
  environment:
    TMPDIR: /run
- name: run migrations
  ansible.builtin.command:
    cmd: sql-migrate up -config /etc/sqlmigrate/{{ database_name }}.yml

Database creation

Just run CREATE DATABASE.

Easy, right?

Well…

Database creation

  1. Check arguments
  2. Select an available cluster
  3. Create git repository
  4. Run CREATE DATABASE (using a module)
  5. Create secrets
  6. Create roles and users (for applications, humans)
  7. Link the database to the git repository
  8. Run schema migrations

Minor upgrades

Ensure softwares are up-to-date:

  • Security
  • Bugs

Minor upgrades

  • Upgrade packages (DBMS, system)
  • Reboot (if needed)
  • Restart DBMS (if needed)
  • Order by role criticity

Minor upgrade (1/2)

Minor upgrade (2/2)

Database migration

  • Cluster is about to reach maximum capacity
  • Colocate or spread logical divisions
  • Isolate noisy neighbours
  • Major upgrades

Database migration

Move one or more databases from one cluster to another

  1. Setup logical replication
  2. Promote
    • Check
    • Migrate
    • Rollback

Database migration

  • Moved out of a datacenter last year with this method
  • 400+ databases
  • 16.78TiB
  • Under 30 minutes of downtime for the datacenter move
    • Big focus on playbook execution time
  • Thanks to Ansible

External collections

  • community.general
  • community.mysql
  • community.mongodb
  • community.postgresql

Internal collections

  • ovhcloud.internal
  • ovhcloud.mysqlsh
  • ovhcloud.patronictl
  • ovhcloud.sqlmigrate

Implementation

How we use Ansible

Secure Shell (SSH)

How can we securely connect to remote hosts to perform actions?

The Bastion

The Bastion

Ansible + The Bastion

“Ansible Wrapper”

[ssh_connection]
pipelining = True
private_key_file = ~/.ssh/id_ed25519
ssh_executable = /usr/share/ansible/plugins/bastion/sshwrapper.py
sftp_executable = /usr/share/ansible/plugins/bastion/sftpbastion.sh
transfer_method = sftp
retries = 3

https://github.com/ovh/the-bastion-ansible-wrapper

Inventory

Where can we find our hosts to perform operations?

Consul

Consul

Consul service discovery

Consul

  • Nodes
    • name, IP address, meta(data)
  • Services
    • databases
  • Access control list (ACL) with tokens
  • Encryption

Static configuration

  • Node meta
    • server_type
      • postgresql, mysql, filer, …
    • role
      • node, lb, backup, …
    • cluster identifier

Dynamic configuration

  • Node “subrole”
    • primary, replica
  • Database services

Where is my database?

Consul service

Ansible + Consul

How to use the inventory?

With a limit option

ansible server_type_postgresql -m ping

ansible-playbook -l server_type_postgresql playbook.yml

Group combinaison

  • & for intersection (AND)
  • : for multiple groups (OR)
  • ! for exclusion (NOT)
ansible-playbook -l 'test:&subrole_primary' playbook.yml
ansible-playbook -l 'server_type_postgresql:server_type_mysql' playbook.yml
ansible-playbook -l 'server_type_postgresql:!cluster_99' playbook.yml

Execution environments

Where Ansible runs?

Admin server

  • Virtual machine
  • Access via SSH
  • Shared environment
  • No API

AWX

  • Ansible orchestration
  • Running on Kubernetes
  • Personal accounts (via SSO/SAML)
  • REST API, web interface, CLI
  • Notifications (alerting, chat)
  • https://github.com/ansible/awx

Concepts

  • Organization, projects, teams, users, privileges
  • Inventory source
  • Source Control (Git) and Machine (SSH) credentials
  • Job templates
  • Scheduled jobs
  • Notification templates

AWX UI