Migrating from MySQL

I like to say that all database problems are application problems because building a responsive application requires cooperation with the facilities that the database provides. The implicit terms of the contract become most obvious when you try to swap out the backend.

The matter of first importance is to keep a copy of the database running that will allow a progressive testing and migration of services

Creating a Replica

If you need a live stream of updates between a MySQL database and Postgres, you may be able to use pg_chameleon. This project is possible because MySQL databases are normally used in a very simplistic way.

Another method is simply to create and restore a periodic dump. mysqldump provides most of what you need to get off the ground

#!/bin/sh
tables="
  locations
  subnets
  aliases
"
for t in $tables
do
  mysqldump -h db1 \
      -u readonly \
      --password="xyz" \
      --skip-lock-tables \
      --skip-add-locks \
      --compatible=postgresql \
      --complete-insert \
      --no-create-info \
      --skip-extended-insert mydb $t > $t.dump
done

This will create separate dumps for each table. There are some other features:

--skip-lock-tables allows backups from on a read-only connection
--complete-insert specifies column names so that we are not relying on the order they happen to be defined in
--skip-extended-insert generates one line per record. This is very verbose but allows for easy navigation of the resulting file with grep or your favorite editor

Stream Editor Tasks

MySQL column names and table names are not case sensitive, so solving these discrepancies is inevitable

sed -i 's/"Id"/"ID"/g' locations.dump

PostgreSQL is much stricter about type and referential integrity, so sed can be used to fix up data that would fail otherwise. In this example we fix up bogus dates

sed -i -e "s/'0000-00-00 00:00:00'/NULL/g" -e "s/,'/,E'/g" aliases.dump

Restoring Data

Imperative for any database is a means of reproducing the schema. For SQLAlchemy this is accomplished using create_all

from sqlalchemy import Column, types
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Subnets(Base):
    __tablename__ = "subnets"
    subnet_id = Column(types.Integer, primary_key=True)
    name = Column(types.Unicode(20), unique=True)

uri = "postgresql://postgres@localhost/test"
engine = create_engine(uri)
Base.metadata.create_all(bind=engine)

Laying the schema creation out in a single text file is also a great tactic.

MySQL does not quote strings in a SQL-standard complaint way. PostgreSQL can read MySQL strings by prepending the backslash_quote option to each dump

SET standard_conforming_strings = 'off';
SET backslash_quote = 'on';

By default, psql(1) prints warnings and errors to but does not exit non-zero if n SQL statement fails. This may be fine, but for some operations you can fail the entire script by adding an ERR trap and setting psql to return an error code

#!/bin/sh
trap 'printf "$0: exit code $? on line $LINENO\n" >&2; exit 1' ERR

psql $url -v ON_ERROR_STOP=1 -f my.dump

Resetting Sequences

Once the data is restored we need to reset sequences so that subsequent inserts to our database do not cause a collision

for t in $tables
do
  max_id=$(psql -Atq $db <<SQL
  SELECT MAX("ID") FROM "$t";
SQL
)
  psql -q $db cluster <<SQL
  ALTER SEQUENCE "${t}_ID_seq" RESTART WITH $(($max_id+1));
SQL
done

Again, much of this kind of work can be automatically translated by pg_chameleon. One advantage to scripting these steps one by one is that each error can be addressed individually without holding up the entire mechanism.

Adjusting Column Default Values

Some differences in the way application handle the data from a database can be handled by adjusting column defaults. The PostgreSQL timestamptz for example report milliseconds, whereas MySQL's date type only reports seconds. We can mimic this behavior when recording the current type using a column default like this:

DEFAULT date_trunc('second', now())

What if we're using an ORM? If all else fails apply a schema adjustment as a separate operation after create_all() . SQLAlchemy allows you to chain these functions these functions an them apply them by setting server_default .

from sqlalchemy import literal_column, func

literal_second = literal_column("'second'")
db_now = func.date_trunc(literal_second, func.now())

class Ticket(Base):
    __tablename__ = 'ticket'
    id = Column(Integer(), primary_key=True)
    added = Column(DateTime(), server_default=db_now)

Access Over FDW

Another option for migrating data is to establish a link foreign data wrappers using the mysql_fdw extension.

CREATE SCHEMA mysql_fdw;

CREATE SERVER db1
    FOREIGN DATA WRAPPER mysql_fdw
    OPTIONS (host '172.16.0.10', port '3306');

CREATE USER MAPPING FOR localharvest
    SERVER db1
    OPTIONS (username 'readonly', password '********');

Now connect as an unpriviledged user and import all tables into a dedicated schema

IMPORT FOREIGN SCHEMA cluster FROM SERVER db1 INTO mysql_fdw;

List foreign tables in this schema using \dE mysql_fdw.*.