Sometimes you take up a task because it needs to be done even though no one will notice. Setting up backups with real-time updates and point-in-time recovery may go unnoticed for a period, but it is not a thankless task.
The first time something very bad happens you will have a good story to tell. This story includes the fact that we have proper backups, but it also includes the changes and updates that you make every day to enable applications to evolve and adapt. Daily changes are best made under conditions are calm.
Mistakes will happen, and when they do a full database replica with streaming updates provide the means of recovering. Not living in terror of data loss means you can get a good nights rest, and write better software during the day. If this software can be moved into production without too much fuss you will definitely be thanked.
Making a Copy
In times past rsync (along with invocations to
was the most
effective way of brining up a new standby server, but these days pg_basebackup
is usually safer and more elegant. It operates by connecting as a
client, so we need configure the database server as a master first by
# postgresql.conf wal_level = hot_standby max_wal_senders = 4 wal_keep_segments = 100
The value of
instructs the server to keep at least this number of WAL logs. This will allow
a replica to catch up if it is disconnected. For the purpose of a backup it
also allows us to sync up after the initial clone.
After a reload we can create a copy of the master using
pg_basebackup -U postgres -D backup
Hopefully you are taking this one step further to create an off-site replica.
We can use the
option to rate-limit the stream over a WAN:
mkdir -m 0700 backup cd backup pg_basebackup -h primary -U postgres -D - -Ft -x -z -vP \ --max-rate=10M | tar -xzf -
The options supplied to pg_basebackup will stream the results over STDOUT in the tar format while displaying periodic progress updates on STDERR.
LOG: transaction log start point: 0/5000024 on timeline 1 LOG: 42286/42286 kB (100%), 1/1 tablespace LOG: transaction log end point: 0/50000DC LOG: pg_basebackup: base backup completed
Note that everything is copied from the master, including configuration files and log files. Starting a server to access the new backup is now as easy as starting up the server using the cloned data directory
cp -R backup backup_copy pg_ctl -D backup_copy start
Tailing pg_log/* sould indicate that the server was able to reach a consistent state
LOG: creating missing WAL directory "pg_wal/archive_status" LOG: redo starts at 36/750000C8 LOG: consistent recovery state reached at 36/75751FC8
Now that we have a clone, pg_receivewal can be used to stream transaction logs from the primary
pg_receivewal -h primary -U postgres -D backup
Running a replica will usually protect you from a hardware failure, it will
not protect you from operator error or a faulty application.
enables you to replay the transactions up to a specific point in time. It is
activated by defining
then Postgres will replay each transaction log until the specified timestamp
# recovery.conf recovery_target_time = '2015-06-29 08:00:00'
is used to copy files from an transaction log archive. The examples above are
not using the archive feature, but this command is still required. When the
server is started the
is executed, will fail (because
does not exist) and the data used in
will be used instead.
pg_ctl -D backup start
Postgres will record that it is running a point-in-time recovery to the specified time
LOG: starting point-in-time recovery to 2015-06-29 08:00:00-04 LOG: database system was not properly shut down; automatic recovery in LOG: redo starts at 36/756E53B8
Once the restore succeeds
is renamed to
and the server starts up in master mode with a new timeline.
Establishing a Delayed Replica
Storing WAL using
works well for keeping transaction logs, but there is no way to apply these
regularly, so a fresh full backup needs to taken periodically. Starting a
server in-place on the backup will replay the WALL up to date, but a new
timeline will be established during this process which will prevent us from
receiving further WAL updates.
[I think] the most elegant solution to this is to run "live backups" using a delayed replica.
# recovery.conf recovery_min_apply_delay = '5d'
Since these servers are mostly idling inbetween update appliation a bunch of them can be run on a single host using different port numbers, or my preference is to stand them up in limited footprint VMs in way nearly identical to every other standby.
As of PostgreSQL 10, standby servers with an apply delay do not reconnect in a timely manner. A patch is available which ensures that delayed replicas resume streaming.
Verifying Delayed Replica Status
Before we can rest assured that we have a usable copy of a database on another server or at another site there is one final task: verifying that our standby servers are up to date. We can do this by comparing the log sequence number flushed to disk on the master with the corresponding record that should have been received on each standby.
localharvest-writer=> SELECT pg_current_wal_flush_lsn(); pg_current_wal_flush_lsn -------------------------- 5/43C9D298
localharvest-query=> select '5/43C9D298' - pg_last_wal_receive_lsn() AS lsn_delta; lsn_delta ----------- -6512
Alert if the standby server is greater some number of bytes behind the master.