Synchronizing PostgreSQL Databases
Option 1: Triggers
At first triggers seem like the logical way to go, but it's not as easy as one might hope because the text for the SQL updates to a slave database have to be reconstructed for every event. There's no such idea as a "calling query", or $@ in the shell world, that can simply be run against a remote database connection.
Option 2: Logging
In terms of simplicity this is definitely the way to go: make Postgres log all UPDATE, INSERT, and DELETE statements and replay them on backup databases. Several modifications need to be made to postgresql.conf:
redirect_stderr = on log_directory = 'pg_log' log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log log_rotation_age = 15 log_line_prefix = ':%u:%d:' log_statement = 'mod'
All of these figures are important. redirect_stderr, log_filename, and log_directory start logging to the directory ~/data/pg_log with log names that are timestamped. The log_rotation_age essentially determines how often (in minutes) updates can be sent out because we only want to scan and remove log files that are not being written to by postmaser.
The string ':%u:%d:' in log_line_prefix sets up a format that a script can use to (a) identify the user that made the query, (b) the database it was called on, and (c) identify log entries that are notifications, and not queries at all; resulting in lines beginning with :::.
Finally, log_statement = 'mod' enables logging for record changes only.
The Log Parser
#!/usr/pkg/bin/ruby
#
# logparser.rb by Eric Radman
# Modified 2007-01-04
PG_LOGS = "/var/pgsql/data/pg_log"
PG_SYNC = "/home/system"
def parselog(logfile)
puts "Processing '#{logfile}'"
lf = File.new(logfile)
sql = db = ""
lf.each_line {|ln|
if ln.slice(0,2) === "::" # Not a query, skip
print '.' # - skip -
elsif ln.slice(0,1) === ":" # New statement
flush_query db, sql # Write query before next statement
print '!' # - new statement -
sql = ln.split(/statement:/)[1]
db = ln.split(':')[2]
else # Everything else is statement body
print '-' # - statement continued -
sql.concat(ln)
end
}
flush_query db, sql # Write whatever was collected 'till EOF
puts
File.delete(logfile)
puts "done"
end
def flush_query(db, sql)
if sql.strip.length > 0 and db != "" then
sql.strip!.concat(";\n")
begin
file = open("#{PG_SYNC}/db_#{db}_updates.sql", "a")
file.syswrite(sql)
end
end
end
if $0 == __FILE__
lst = Dir.entries(PG_LOGS).delete_if {|i| i[0,11] != 'postgresql-'}.sort
if lst.length > 1
parselog "#{PG_LOGS}/#{lst[0]}"
end
end
The if $0 == __FILE__ statement test to see if there's more than one log file in the folder specified by PG_LOGS and if there is, it's parsed and separate log files for each database are written to the folder PG_SYNC.
In my case the destination is not writable by the user pgsql running sqllogparser.rb so I created files with write access in the /home/system.
Scheduling Updates
Now set up the pgsql crontab:
*/6 * * * * /var/pgsql/data/pg_log/parselog.rb
... and watch for mail to be generated with the result of the operation:
$ mail Mail version 8.1 6/6/93. Type ? for help. "/var/mail/pgsql": 1 message 1 new >N 1 root@teisprint.net Thu Jan 04 14:36 9/1253 "Cron <pgsql@am2800--nb0> /var/pgsql/data/pg_log/parselog" & 1 Message 1: From root@teisprint.net Thu Jan 04 14:36:00 2007 Envelope-to: pgsql@localhost Delivery-date: Thu, 04 Jan 2007 14:36:00 -0500 From: root@teisprint.net (Cron Daemon) To: pgsql@localhost.teisprint.net Subject: Cron <pgsql@am2800--nb0> /var/pgsql/data/pg_log/parselog.rb ... Date: Thu, 04 Jan 2007 14:36:00 -0500 Processing '/var/pgsql/data/pg_log/postgresql-2007-01-04_141500.log' !--------------------!--!------------!----!--------------------!----------------- --------!------------!----!-------------------------!--!------------!----!------- -------------------!----------------------!------------!----!-------------------- --!------------!----!---------------------!------------!----!-------------------- !------------!----!------------------------!------------!----!------------------- ----!------------------------!------------------------- done
Assuming that the slave database server has the update files (pushed over NFS, SSH, etc.), a crontab can be set up to apply the updates:
*/20 * * * * /home/system/bin/sync-db.sh
The results to apply:
-rw-r--r-- 1 pgsql wheel 0 Jan 3 16:19 db_pgsql_updates.sql -rw-rw---- 1 pgsql wheel 2016 Jan 4 14:48 db_roundcube_updates.sql -rw-rw---- 1 pgsql wheel 6128 Jan 4 14:48 db_system_updates.sql -rw-r--r-- 1 pgsql wheel 0 Jan 3 16:19 db_tei_updates.sql
My update script:
#!/bin/sh psql -U system system < ~/db_system_updates.sql cat /dev/null > ~/db_system_updates.sql psql -U webmail roundcube < ~/db_roundcube_updates.sql cat /dev/null > ~/db_roundcube_updates.sql
PostgreSQL doesn't allow the database password to be passed in the command line; the passwords are stored in ~/.pgpass.
Not Done Yet
Okay, it couldn't be that easy, could it? It's not. The problem is that this scheme assumes_that the same INSERT statement on the same database running on two different servers will have the same result. One case where this is not true is if an auto-index is used on a column. To deal with the your application to must be coded to set indexes explicitly:
conn = dbh.prepare("SELECT nextval('customers_id_seq')")
conn.execute
Then pass the result of conn.fetch[0] to the function that adds the records instead of relying on nextval() to be executed by the column's DEFAULT parameter.