PostgreSQL Logical Replication
SQL databases provide a critical service by giving the application globally consistent access to data, but logical replication makes it possible to trade in the guarantee that a query always reads the most recent write for other features. Instead of global behavior, logical replication allows a relational database to target particular features for distributed processing.
I may be mistaken, but I believe the ability to replicate a subset of data
between databases that are otherwise unrelated has become a unique feature of
relational databases. Unlike statement-based replication, logical replication
is PostgreSQL is based on the WAL, and enables strict guarantees. It's also a
modular architecture, but the built-in plugin
can usually be assumed.
Terminology and Naming Conventions
Logical replication can employed in arbitrarily complex ways, and in order to keep the relationships strait you need to establish a clear naming conventions
Here is a suggestion for simple configurations where there is a single stream from one database to another:
- Name the subscription based on the database sending data
- Name the publication and slot based on the database receiving data
Note that a subscription may use one or more publications but for simplicity the above recommendation assumes each subscription only has one corresponding publication.
There are two requirements to enable logical replication:
A user with the
PostgreSQL will not allow a subscription to be created until the upstream server meets these requirements.
Selecting Tables for Export
CREATE PUBLICATION archivedb FOR TABLE farmer, product;
commands for some operations are not obvious, but you can get help with
localharvest=> \dRp+ List of publications Name | Owner | All tables | Inserts | Updates | Deletes -----------+--------------+------------+---------+---------+--------- archivedb | localharvest | f | t | t | t (1 row) Tables: "public.farmer" "public.product"
List subscriptions using
SELECT * FROM pg_stat_subscription;
Establishing a Connection
CREATE SUBSCRIPTION localharvest CONNECTION 'host=db1 dbname=localharvest user=localharvest password=XXXXXX' PUBLICATION archivedb WITH ( slot_name=archivedb );
Tables must have a primary key, otherwise, you'll see this unhelpful error
ERROR: relation "public.farmer" does not exist
Also, you can only connect to a master
ERROR: could not create replication slot "localharvest": ERROR: logical decoding cannot be used while in recovery
Once a subscription is created a corresponding replication slot is created on the master
localharvest=# \x localharvest=# select * from pg_replication_slots; -[ RECORD 1 ]-------+-------------- slot_name | archivedb plugin | pgoutput slot_type | logical datoid | 16478 database | localharvest temporary | f active | t active_pid | 41867 xmin | catalog_xmin | 606 restart_lsn | 0/16C0280 confirmed_flush_lsn | 0/16C02B8
Postgres will run triggers on a replicated table, if you set it to respond to remote updates
ALTER TABLE farmer ENABLE REPLICA TRIGGER;
Mirroring Table Definitions
The synchronization of table contents is handled automatically, providing you have created the tables in both places.
# publisher pg_dump -s -t farmer -t product > replicated_tables.sql
# subscriber psql -f replicated_tables.sql
If a column is added in one database, replication will be blocked but PostgreSQL will retry until the schema change has been applied to all of the subscribers as well.
Promoting a New Master
If you promote a standby to a master and change the DNS name pointing to it the logical replication launcher will resolve the new name. There is one manual step however: physical replication does not copy replication slots, so in the case of failover we need to create a slot manually
SELECT pg_create_logical_replication_slot('localharvest', 'pgoutput');
Shortly after this slot appears the subscriber will receive missing updates
localharvest=# SELECT * FROM pg_stat_subscription; -[ RECORD 1 ]---------+------------------------------ subid | 35039 subname | localharvest pid | 74670 relid | received_lsn | 0/7001078 last_msg_send_time | 2020-01-10 11:57:03.890308-05 last_msg_receipt_time | 2020-01-10 11:57:03.87312-05 latest_end_lsn | 0/7001078 latest_end_time | 2020-01-10 11:57:03.890308-05
Promoting a New Master (subscriber)
Subscriptions are only active on the master, but the standby servers have the configuration
localharvest=# SELECT * FROM pg_stat_subscription; -[ RECORD 1 ]---------+----------------- subid | 32817 subname | localharvest pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time |
If a standby is promoted this subscription will become active. I see how this design was intended to help, but this is a insidious hazard: promoting a standby can grab and use the replication slot you intended to be used for another master!
Any time a subscription has been dropped and subsequently recreated there is the opportunity for missed updates. To find the difference between two we need to run a query to compare both sides. Foreign tables provide a flexible means of accessing the upstream data
CREATE EXTENSION postgres_fdw; CREATE SERVER localharvest_query FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localharvest-query', dbname 'localharvest'); CREATE USER MAPPING FOR localharvest SERVER localharvest_query OPTIONS (user 'localharvest', password '******');
CREATE FOREIGN TABLE product_fdw ( product_id serial PRIMARY KEY, name varchar(48) NOT NULL /* ... */ ) SERVER localharvest_query OPTIONS (schema_name 'public', table_name 'product', fetch_size '50000');
will dramatically improve the time it takes to copy a large table. Once the
link is established we can create an index of records that have not been
CREATE TEMPORARY TABLE missing_product_ids AS SELECT product_fdw FROM product EXCEPT ALL SELECT product_id FROM product;
Then use this information to build a table containing the full contents of the missing records
CREATE TABLE missing AS SELECT product_fdw.* FROM product_fdw JOIN missing_product_ids USING (product_id);
If downtime is acceptable, a simpler solution is to drop the subscription; truncate tables, and create the subscription again. Some other techniques include:
Compare values of each record
Speed up comparisons by creating a local copy of the data using
CREATE MATERIALIZED VIEW
If communication is functioning, all you have to do to unhook a subscriber is run
and the corresponding replication slot will be removed from the master. If
the subscriber cannot communicate with the master the slot must be removed
SELECT * FROM pg_replication_slots; SELECT pg_drop_replication_slot('localharvest');
To drop a subscription that no longer has a replication slot upstream
ALTER SUBSCRIPTION localharvest DISABLE ; ALTER SUBSCRIPTION localharvest SET (slot_name=NONE); DROP SUBSCRIPTION localharvest ;
Diagnosing Jammed Replication
The easiest problem to debug is a key violation: if an entry with a unique index already exists Postgres will log an error. Other failures, such as an exception in a trigger function may not produce an error. Instead the clue is that the logical replication worker repeatedly dies
LOG: logical replication apply worker for subscription "localharvest" has started LOG: background worker "logical replication worker" (PID 240271) exited with exit code 1
are silently discarded if a given record does not exist, so you can manually
work around a problem by removing records and re-adding them on the subscriber.
The subscriber may have extra columns, but it must have all of the same columns as the publisher. If you drop a required column before an update is pushed you will see an error like this
logical replication target relation "public.farmer" is missing some replicated columns
You can add the missing columns or (if possible) drop and recreate the tables. Then synchronize the data using
ALTER SUBSCRIPTION localharvest REFRESH PUBLICATION WITH ( copy_data=false );
Refreshing the publication will also copy the data for tables that were added the publication upstream.
One of the persistent issues of managing logical replication is to find which
rows exist in one table but not the other. By importing a snapshot of a table
SELECT ... INTO
we can discover missing records. There is more than one way to do this; one
SELECT * INTO TABLE inventory_missing FROM inventory_copy WHERE NOT EXISTS ( SELECT FROM inventory WHERE id = inventory_copy.id );
Now we have a delta that we can use to fill in the missing records.
INSERT INTO inventory SELECT * FROM inventory_missing; DROP TABLE inventory_copy; DROP TABLE inventory_missing;
Can we configure bidirectional replication using native logical replication? The answer is no:
- Logical replication does not yet provide a means of implementing conflict resolution.
- There is mechanism to control propagation of updates or avoid cycles.