Eric Radman : a Journal

Patroni and Consul Service Discovery

Among the tricky aspects to automating database failover is the need to

  1. redirect existing connections to the new master and
  2. update DNS to reflect the new toplogy

Patroni's integration with Consul can solve both of these problems, since Patroni sets tags for each host indicating it's role, Consul keys can be monitored for changes, and Consul itself can publish nameservice records.

The Postgres client library (libpq) will try to resolve names in order, so we can prefer read-only queries with a fallback to the current master by listing them first

postgresql://user@raddb-query.service.eradman.com,raddb-writer.service.eradman.com/db

Local Consul Agent

Patroni uses the DCS (distributed configuration store) as a coordination point, so a Consul cluster that is propagated to other datacenters using consul-replicate will not suffice. Instead, we need to establish set of a consul servers that spans all datacenters, which I'll call the "global" DC to indicate that it's not a geographical site

# /etc/consul.d/consul.hcl
data_dir = "/var/lib/consul"
datacenter = "global"
domain = "consul.eradman.com"
encrypt = "xxxxxxxxxxxxxxxxxxxxxxxx"
primary_datacenter = "global"
retry_join = ["10.2.0.30", "10.3.0.30", "10.4.0.30"]

The membership then should show servers that keep consensus across all data centers

$ consul members
Node                      Address         Status  Type    Build  Protocol  DC      Segment
consul4.east.eradman.com  10.2.0.30:8301  alive   server  1.4.3  2         global  <all>
consul4.north.eradman.com 10.4.0.30:8301  alive   server  1.4.3  2         global  <all>
consul4.west.eradman.com  10.3.0.30:8301  alive   server  1.4.3  2         global  <all>
raddb1.east.eradman.com   10.2.0.90:8301  alive   client  1.4.3  2         global  <default>
raddb1.west.eradman.com   10.3.0.90:8301  alive   client  1.4.3  2         global  <default>

Tags and Prepared Queries

Since Patroni registers tags with each service, we can install a prepared query to enable name server resolution

$ curl -s http://127.0.0.1:8500/v1/query
[
  {
    "Name": "raddb-writer"
    },
    "Service": {
      "Service": "raddb",
      "Failover": {
        "NearestN": 0,
        "Datacenters": [
          "global"
        ]
      },
      "OnlyPassing": true,
      "IgnoreCheckIDs": null,
      "Near": "_agent",
      "Tags": [
        "master"
      ]
    },
    "DNS": {
      "TTL": "30s"
    }
  },
]

With this query in place we can query each separately

$ dig raddb-writer.query.consul.eradman.com +short
$ dig raddb-query.query.consul.eradman.com +short

Redirecting Connections with PgBouncer

Using a connection pooler provides some valuable capabilities, including transaction pooling, the ability to pause/resume connections, and redirect connections by retrying the backend. One means of reconfiguring pgbouncer is to use consul-template to regenerate configuration. Keys referenced in the template are automatically monitored for changes

# /etc/consul-template.d/reconfig-pgbouncer.conf
consul = {
  address = "localhost:8500"
  token = "..."
}

template {
  source      = "/etc/pgbouncer/pgbouncer.ini.ctmpl"
  destination = "/var/lib/consul-template/pgbouncer.ini"
  wait {
    min = "10s"
    max = "30s"
  }
  exec {
    command = "sudo /usr/local/sbin/reconfig-pgbouncer"
  }
}

A minimal pgbouncer configuration includes

[databases]
* = host={{ key "postgresql/raddb/leader" }}

[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

Since consul-template runs at startup as well as a key change it can be used for maintenance tasks, such as synchronizing users

#!/bin/sh
# /usr/local/sbin/reconfig-pgbouncer
trap 'echo "exit code $? on line $LINENO"; exit' ERR

# install pgbouncer configuration and reload
src=/var/lib/consul-template/pgbouncer.ini
if [ -s $src ]; then
  cp $src /etc/pgbouncer/pgbouncer.ini; rm $src
  kill -HUP $(cat /var/run/pgbouncer/pgbouncer.pid)
fi

# refresh userlist
psql -Atq -U postgres <<SQL > /etc/pgbouncer/userlist.txt
  SELECT '"pgbouncer" "md5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"' AS user_hash
  UNION
  SELECT concat('"', usename, '" "', passwd, '"')
  FROM pg_shadow
  WHERE usename NOT IN ('postgres')
  ORDER BY user_hash
SQL

pgBackRest

There are two places where we should check to ensure pgBackRest repository is initialized. First, when the cluster is first bootstrapped, and the other when a new primary is promoted

# patroni.yaml
bootstrap:
  post_init: /var/lib/pgsql/bin/post_init.sh

postgresql:
  parameters:
    archive_command: 'pgbackrest --stanza=main archive-push %p'
    archive_mode: 'on'

  callbacks:
    on_role_change: /var/lib/pgsql/bin/post_init.sh

Note that on is quoted to prevent this value from being loaded as a Python boolean.

The script doing the housework after a failover should check that the backup location is initialized from scratch or upgraded based on exit codes of pgbackrest check

#!/bin/sh

backup_path=/pgbackup-0/$(hostname -s)
[ -d $backup_path ] || mkdir $backup_path

pgbackrest --stanza=main check
case $? in
    28|51)
        pgbackrest --stanza=main --log-level-console=info stanza-upgrade
        ;;
    55)
        pgbackrest --stanza=main --log-level-console=info stanza-create
        ;;
esac

An example cron schedule might look like this

0  15   *   *   0     [ $(psql -c "SELECT pg_is_in_recovery()" -At) == "t" ] || pgbackrest --type=full --stanza=main backup
0  15   *   *   1-6   [ $(psql -c "SELECT pg_is_in_recovery()" -At) == "t" ] || pgbackrest --type=incr --stanza=main backup

Major Version Upgrades

The mechanism for upgrading a Patroni cluster is a tool that you will need to build for yourself. To do this, stand up a test cluster, and repeatedly rebuild/upgrade it until the bugs are shaken out of your script. The basic steps are:

  1. Backup files (leader)
    pg_hba.conf, ident.conf
  2. Stop patroni (all hosts)
    sudo rcctl stop patroni
  3. Erase consul state (leader)
    patronictl -c patroni.yml remove {0}
  4. Initialize new database (leader)
    /usr/pgsql-{0}/bin/initdb -k -D /pg_data/{0} --encoding=UTF8 --locale=en_US.UTF-8
  5. Run upgrade (leader)
    /usr/pgsql-{0}/bin/pg_upgrade -b /usr/pgsql-{1}/bin -B /usr/pgsql-{0}/bin -d /pg_data/{1} -D /pg_data/{0}
  6. Restore files (leader)
    cp /tmp/pg_hba.conf.saved /pg_data/{0}/pg_hba.conf
  7. Update patroni with new paths (all hosts)
    sed -i -e 's:/pg_data/{0}:/pg_data/{1}:g' -e 's:/usr/pgsql-{0}:usr/pgsql-{1}:g' patroni.yml
  8. Start patroni (all hosts)
    sudo rcctl start patroni
  9. Analyze (leader)
    /usr/pgsql-{0}/bin/vacuumdb --all --analyze-in-stages
  10. Push updated configuration

The upgrade script depends on some state which is dynamic, we can query Patroni itself for the topology, and drive the rest of the upgrade

#!/usr/bin/python

import sys
import requests

class PatroniCluster:
    version = None
    database = None
    leader = None
    replicas = []

    def __init__(self, url):
        r = requests.get(url + "/").json()
        self.version = r['server_version']/10000
        self.next_version = self.version + 1
        self.database = r['patroni']['scope']

        # roles
        r = requests.get(url + "/cluster").json()
        for member in r['members']:
            hostname = str(member['name'])
            if member['role'] == 'leader':
                self.leader = hostname
            else:
                self.replicas.append(hostname)

        self.all_hosts = [self.leader] + self.replicas

    def info(self):
        print " version:", self.version, "->", self.next_version
        print "database:", self.database
        print "  leader:", self.leader
        print "replicas:", ",".join(self.replicas)

if __name__ == "__main__":
      if len(sys.argv) != 2:
        print "usage: patroni-upgrade.py patroni_host"
        sys.exit(1)
    url = "http://{}:8008".format(sys.argv[1])

    cluster = PatroniCluster(url)
    cluster.info()

Key-store Maintenance

Patroni periodically writes to the DCS, and if Consul is not responding properly all databases will be restarted in read-only mode. To avoid this you can disable failover on each cluster

patronictl -c patroni.yml pause
# DCS maintenance
patronictl -c patroni.yml resume

Restoring from a Backup

Begin by stopping all of the standby databases and finally the leader

mv /pg_data/{14,14.old}
mkdir /pg_data/14
pgbackrest --stanza=main --log-level-console=warn restore

After this run the restore and start/resume/stop the database

export PATH=/usr/pgsql-14/bin:$PATH
bin/pg_ctl -D /pg_data/14 start
psql -c 'select pg_wal_replay_resume()'
bin/pg_ctl -D /pg_data/14 stop

Finalize by starting the Patroni leader. Before bringing the standbys up remove their data directories to cause Patroni to run a full synchronization.

Last updated on November 21, 2021