Eric Radman : a Journal

How to Define a Schema

In software, an architect is someone who is responsible for assessing the implications of design decisions as they are made. Some decisions are final, others can be renegotiated later. The libraries and methods you choose to use for accessing an SQL database may not be final, but the cumulative effect of implications is massive.

If you want to define a database schema in a way that is efficient and flexible is only one answer: open up your favorite editor and start writing it down in a text file.

metadata.create_all()

If you use an ORM, it can generate something for you. I don't want to say much about this, because this technique is an arbitrary commitment to use only the most basic features of your database. The only way this strategy can succeed operationally is by applying a series of patches to correct and augment what the framework generated.

A Structure that Scales

For a project of any size one file will become unwieldy, but breaking this down is simple

roles.sql
schema/
  01-extensions.sql
  02-functions.sql
  03-tables.sql
  04-partitions.sql
  05-views.sql
  06-permissions.sql
  07-fixtures.sql
  08-triggers.sql

The reason I treat database roles separately is that they are global, and don't relate to a single database. Applying these to an ephemeral or development database is also easily scripted

#!/bin/sh -e

[ $# == 1 ] || {
    echo "Usage: initdb url"
    exit 1
}

psql="psql $1 --no-psqlrc -v ON_ERROR_STOP=1"

for f in roles.sql schema/*.sql; do
    echo $f
    $psql -o /dev/null -f $f
done

With this in place we can create a precheckin script which ensures that everything is in good order

./initdb $(pg_tmp)

Integration Testing

For some projects it makes sense to pack the database schema into the same repository as the tooling that accesses it, but other databases are accessed by a range of systems. If this is the case, putting the schema into it's own repository gives you a brilliant way to initialize a database before running tests

db=localharvestdb
[ -d $db ] || git clone http://git.eradman.com/$db
cd $db
git pull
./initdb $(pg_tmp -t)

Running the unit tests for each project based on a common, complete schema that is also used in production provides a very consistent platform for verifying that production code is in agreement with the database schema.

Migrations

In my experience, every available tool for generating schema diffs suffers from the same problems: numerous bugs and inadequate of support for the wide range of features that PostgreSQL provides. This is unfortunate; the capability to compare a database with what you have defined is important because it allows you to run a production system that really tracks when is in version control.

To this end I added a new tool to the ephemeralpg called ddl_compare which automates schema comparison using a patched version of surprisingly complete PostgreSQL extension ddlx.

pg_dump -s localharvest > localharvest_dump.sql
pg_compare localharvest_schema.sql localharvest_dump.sql

The output is in a/ and b/

vimdiff {a,b}/farmer

The result is a canonical representation of each table in the database which can be systematic compared.

Last updated on November 27, 2019