Eric Radman : a Journal

How to Define a Schema

In software, an architect is someone who is responsible for assessing the implications of design decisions as they are made. Some decisions are final, others can be renegotiated later. The libraries and methods you choose to use for accessing an SQL database may not be final, but the cumulative effect of implications is massive.

If you want to define a database schema in a way that is efficient and flexible is only one answer: open up your favorite editor and start writing it down in a text file.

metadata.create_all()

If you use an ORM, it can generate something for you. I don't want to say much about this, because this technique is an arbitrary commitment to use only the most basic features of your database. The only way this strategy can succeed operationally is by applying a series of patches to correct and augment what the framework generated.

A Structure that Scales

For a project of any size one file will become unwieldy, but breaking this down is simple

roles.sql
schema/
  01-extensions.sql
  02-functions.sql
  03-tables.sql
  04-partitions.sql
  05-views.sql
  06-permissions.sql
  07-fixtures.sql
  08-triggers.sql

The reason I treat database roles separately is that they are global, and don't relate to a single database. Applying these to an ephemeral or development database is also easily scripted

#!/bin/sh -e

[ $# == 1 ] || {
    echo "Usage: initdb url"
    exit 1
}

psql="psql $1 --no-psqlrc -v ON_ERROR_STOP=1"

for f in roles.sql schema/*.sql; do
    echo $f
    $psql -o /dev/null -f $f
done

With this in place we can create a precheckin script which ensures that everything is in good order

./initdb $(pg_tmp)

Integration Testing

For some projects it makes sense to pack the database schema into the same repository as the tooling that accesses it, but other databases are accessed by a range of systems. If this is the case, putting the schema into it's own repository gives you a brilliant way to initialize a database before running tests

db=localharvestdb
[ -d $db ] || git clone http://git.eradman.com/$db
cd $db
git pull
./initdb $(pg_tmp -t)

Running the unit tests for each project based on a common, complete schema that is also used in production provides a very consistent platform for verifying that production code is in agreement with the database schema.

Migrations

No, schema comparison tools are far from perfect, but I have had good success with migra. These tools will likely produce some spurious output because of missing features in the migration tool, or there may be real differences in the production database that should be ignored.

In both instances the solution is the same: create a dump and restore of the production database schema to an ephemeral database and make alterations before running the comparison. Implementing this step is more than a matter of convenience! Comparing schemes allows you to run a production system that really tracks when is in version control.

Last updated on February 25, 2019