PostgreSQL Logical Replication
SQL databases provide a critical service by giving the application globally consistent access to data, but logical replication makes it possible to trade in the guarantee that a query always reads the most recent write for other features. Instead of global behavior, logical replication allows a relational database to target particular features for distributed processing.
I may be mistaken, but I believe the ability to replicate a subset of data
between databases that are otherwise unrelated has become a unique feature of
relational databases.
Unlike statement-based replication, logical replication is PostgreSQL is based
on the WAL, and enables strict guarantees.
It's also a modular architecture, but the built-in plugin
(pgoutput
)
can usually be assumed.
Terminology and Naming Conventions
Logical replication can employed in arbitrarily complex ways. Here is one naming convention where there is a single stream from one database to another:
- Name the subscription based on the database sending data
- Name the publication and slot based on the database receiving data
Note that a subscription may use one or more publications but for simplicity the above recommendation assumes each subscription only has one corresponding publication.
Server Configuration
There are two requirements to enable logical replication:
-
A user with the
replication
attribute -
Set
wal_level
to
logical
PostgreSQL will not allow a subscription to be created until the upstream server meets these requirements.
Selecting Tables for Export
CREATE PUBLICATION archivedb FOR TABLE farmer, product;
The
psql(1)
commands for some operations are not obvious, list command summaries using
\?
.
localharvest=> \dRp+ List of publications Name | Owner | All tables | Inserts | Updates | Deletes -----------+--------------+------------+---------+---------+--------- archivedb | localharvest | f | t | t | t (1 row) Tables: "public.farmer" "public.product"
List subscriptions using
\dRs
or
SELECT * FROM pg_stat_subscription;
Starting with PostgreSQL 15, all tables (and materialized views) may be exported from a specified schema
CREATE PUBLICATION archivedb FOR TABLES IN SCHEMA public, archive;
Establishing a Connection
CREATE SUBSCRIPTION localharvest CONNECTION 'host=db1 dbname=localharvest user=localharvest password=XXXXXX' PUBLICATION archivedb WITH ( slot_name=archivedb );
Tables must have a primary key, otherwise, and unhelpful error such as
ERROR: relation "public.farmer" does not exist
Subscriptions can only created by connecting to a master
ERROR: could not create replication slot "localharvest": ERROR: logical decoding cannot be used while in recovery
Once a subscription is created a corresponding replication slot is created on the master
localharvest=# \x localharvest=# select * from pg_replication_slots; -[ RECORD 1 ]-------+-------------- slot_name | archivedb plugin | pgoutput slot_type | logical datoid | 16478 database | localharvest temporary | f active | t active_pid | 41867 xmin | catalog_xmin | 606 restart_lsn | 0/16C0280 confirmed_flush_lsn | 0/16C02B8
Triggers
Postgres will run triggers based on remote updates if
REPLICA TRIGGER
is enabled
ALTER TABLE farmer ENABLE REPLICA TRIGGER;
Mirroring Table Definitions
The synchronization of table contents is handled automatically, providing the tables exists in both places.
# publisher pg_dump -s -t farmer -t product > replicated_tables.sql
# subscriber psql -f replicated_tables.sql
If a column is added in one database, replication will be blocked but PostgreSQL will retry until the schema change has been applied to all of the subscribers as well.
Promoting a New Master
If a standby is promoted and the DNS name pointing to it changes the logical replication launcher will resolve the new name. There is one manual step however: physical replication does not copy replication slots, so in the case of failover we need to create a slot manually
SELECT pg_create_logical_replication_slot('localharvest', 'pgoutput');
Shortly after this slot appears the subscriber will receive missing updates
localharvest=# SELECT * FROM pg_stat_subscription; -[ RECORD 1 ]---------+------------------------------ subid | 35039 subname | localharvest pid | 74670 relid | received_lsn | 0/7001078 last_msg_send_time | 2020-01-10 11:57:03.890308-05 last_msg_receipt_time | 2020-01-10 11:57:03.87312-05 latest_end_lsn | 0/7001078 latest_end_time | 2020-01-10 11:57:03.890308-05
Promoting a New Master (subscriber)
Subscriptions are only active on the master, but the standby servers have the configuration
localharvest=# SELECT * FROM pg_stat_subscription; -[ RECORD 1 ]---------+----------------- subid | 32817 subname | localharvest pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time |
If a standby is promoted this subscription will become active. I see how this design was intended to help, but this is a insidious hazard: promoting a standby can grab and use the replication slot intended to be used for another master!
Handling Discrepancies
Any time a subscription has been dropped and subsequently recreated there is the opportunity for missed updates. To find the difference between two we need to run a query to compare both sides. Foreign tables provide a flexible means of accessing the upstream data
CREATE EXTENSION postgres_fdw; CREATE SERVER localharvest_query FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localharvest-query', dbname 'localharvest'); CREATE USER MAPPING FOR localharvest SERVER localharvest_query OPTIONS (user 'localharvest', password '******');
CREATE FOREIGN TABLE product_fdw ( product_id serial PRIMARY KEY, name varchar(48) NOT NULL /* ... */ ) SERVER localharvest_query OPTIONS (schema_name 'public', table_name 'product', fetch_size '50000');
Increasing
fetch_size
will dramatically improve the time it takes to copy a large table.
Once the link is established we can create an index of records that have not
been replicated
CREATE TEMPORARY TABLE missing_product_ids AS SELECT product_fdw FROM product EXCEPT ALL SELECT product_id FROM product;
Then use this information to build a table containing the full contents of the missing records
CREATE TABLE missing AS SELECT product_fdw.* FROM product_fdw JOIN missing_product_ids USING (product_id);
If downtime is acceptable, a simpler solution is to drop the subscription; truncate tables, and create the subscription again. Some other techniques include:
-
Compare values of each record
row_to_json()
-
Speed up comparisons by creating a local copy of the data using
CREATE MATERIALIZED VIEW
Disconnecting Subscribers
If communication is functioning, unhook a subscriber is run
DROP SUBSCRIPTION
,
and the corresponding replication slot will be removed from the master.
If the subscriber cannot communicate with the master the slot must be removed
manually
SELECT * FROM pg_replication_slots; SELECT pg_drop_replication_slot('localharvest');
To drop a subscription that no longer has a replication slot upstream
ALTER SUBSCRIPTION localharvest DISABLE ; ALTER SUBSCRIPTION localharvest SET (slot_name=NONE); DROP SUBSCRIPTION localharvest ;
Diagnosing Jammed Replication
The easiest problem to debug is a key violation: if an entry with a unique index already exists Postgres will log an error. Other failures, such as an exception in a trigger function may not produce an error. Instead the clue is that the logical replication worker repeatedly dies
LOG: logical replication apply worker for subscription "localharvest" has started LOG: background worker "logical replication worker" (PID 240271) exited with exit code 1
UPDATE
and
DELETE
are silently discarded if a given record does not exist.
Manually work around a problem by removing records and re-adding them on the
subscriber.
The subscriber may have extra columns, but it must have all of the same columns as the publisher.
Monitoring Replication Slots
Any slot where
active='f'
is a candidate for an alert.
Another metric is the delay (in bytes)
SELECT slot_name, slot_type, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) FROM pg_replication_slots;
Reconciling Tables
One of the persistent issues of managing logical replication is to find which
rows exist in one table but not the other.
By importing a snapshot of a table created with
SELECT ... INTO
we can discover missing records.
There is more than one way to do this; one form is
SELECT * INTO TABLE inventory_missing FROM inventory_copy WHERE NOT EXISTS ( SELECT FROM inventory WHERE id = inventory_copy.id );
Now we have a delta that we can use to fill in the missing records.
INSERT INTO inventory SELECT * FROM inventory_missing; DROP TABLE inventory_copy; DROP TABLE inventory_missing;