Migrating from MySQL
I like to say that all database problems are application problems because building a responsive application requires cooperation with the facilities that the database provides. The implicit terms of the contract become most obvious when you try to swap out the backend.
The matter of first importance is to keep a copy of the database running that will allow a progressive testing and migration of services
Creating a Replica
If you need a live stream of updates between a MySQL database and Postgres, you may be able to use pg_chameleon. This project is possible because MySQL databases are normally used in a very simplistic way.
Another method is simply to create and restore a periodic dump.
mysqldump
provides most of what you need to get off the ground
#!/bin/sh tables=" locations subnets aliases " for t in $tables do mysqldump -h db1 \ -u readonly \ --password="xyz" \ --skip-lock-tables \ --skip-add-locks \ --compatible=postgresql \ --complete-insert \ --no-create-info \ --skip-extended-insert mydb $t > $t.dump done
This will create separate dumps for each table. There are some other features:
-
--skip-lock-tables
allows backups from on a read-only connection -
--complete-insert
specifies column names so that we are not relying on the order they happen to be defined in -
--skip-extended-insert
generates one line per record. This is very verbose but allows for easy navigation of the resulting file withgrep
or your favorite editor
Stream Editor Tasks
MySQL column names and table names are not case sensitive, so solving these discrepancies is inevitable
sed -i 's/"Id"/"ID"/g' locations.dump
PostgreSQL is much stricter about type and referential integrity, so
sed
can be used to fix up data that would fail otherwise. In this example we fix
up bogus dates
sed -i -e "s/'0000-00-00 00:00:00'/NULL/g" -e "s/,'/,E'/g" aliases.dump
Restoring Data
Imperative for any database is a means of reproducing the schema. For SQLAlchemy this is accomplished using create_all
from sqlalchemy import Column, types from sqlalchemy import create_engine from sqlalchemy.ext.declarative import declarative_base Base = declarative_base() class Subnets(Base): __tablename__ = "subnets" subnet_id = Column(types.Integer, primary_key=True) name = Column(types.Unicode(20), unique=True) uri = "postgresql://postgres@localhost/test" engine = create_engine(uri) Base.metadata.create_all(bind=engine)
Laying the schema creation out in a single text file is also a great tactic.
MySQL does not quote strings in a SQL-standard complaint way. PostgreSQL can read MySQL strings by prepending the backslash_quote option to each dump
SET standard_conforming_strings = 'off'; SET backslash_quote = 'on';
By default,
psql(1)
prints warnings and errors to
ERR
trap and setting
psql
to return an error code
#!/bin/sh trap 'printf "$0: exit code $? on line $LINENO\n" >&2; exit 1' ERR psql $url -v ON_ERROR_STOP=1 -f my.dump
Resetting Sequences
Once the data is restored we need to reset sequences so that subsequent inserts to our database do not cause a collision
for t in $tables do max_id=$(psql -Atq $db <<SQL SELECT MAX("ID") FROM "$t"; SQL ) psql -q $db cluster <<SQL ALTER SEQUENCE "${t}_ID_seq" RESTART WITH $(($max_id+1)); SQL done
Again, much of this kind of work can be automatically translated by pg_chameleon. One advantage to scripting these steps one by one is that each error can be addressed individually without holding up the entire mechanism.
Adjusting Column Default Values
Some differences in the way application handle the data from a database can be
handled by adjusting column defaults. The PostgreSQL
timestamptz
for example report milliseconds, whereas MySQL's
date
type only reports seconds. We can mimic this behavior when recording the
current type using a column default like this:
DEFAULT date_trunc('second', now())
What if we're using an ORM? If all else fails apply a schema adjustment as a
separate operation after
create_all()
. SQLAlchemy allows you to chain these functions these functions an them
apply them by setting
server_default
.
from sqlalchemy import literal_column, func literal_second = literal_column("'second'") db_now = func.date_trunc(literal_second, func.now()) class Ticket(Base): __tablename__ = 'ticket' id = Column(Integer(), primary_key=True) added = Column(DateTime(), server_default=db_now)
Access Over FDW
Another option for migrating data is to establish a link foreign data wrappers using the mysql_fdw extension.
CREATE SCHEMA mysql_fdw; CREATE SERVER db1 FOREIGN DATA WRAPPER mysql_fdw OPTIONS (host '172.16.0.10', port '3306'); CREATE USER MAPPING FOR localharvest SERVER db1 OPTIONS (username 'readonly', password '********');
Now connect as an unpriviledged user and import all tables into a dedicated schema
IMPORT FOREIGN SCHEMA cluster FROM SERVER db1 INTO mysql_fdw;
List foreign tables in this schema using
\dE mysql_fdw.*
.