Eric Radman : a Journal

Ephemeral Postgres Databases

For some time proponents of unit testing have asserted that unit tests should not touch a real database. The standard practice was to mock these interactions at the level of language-native objects. Does this rule apply if the database is in memory? Some realized that an in-process engine such as SQLite seemed to solve this problem nicely since the database can be run without touching the file system. This is how running Django tests might look

# settings_fast.py
# run using
#   python manage.py test --settings=settings_fast

from settings import *

DATABASES = {
   'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': ':memory:',
    }
}

Automatic PG Creation and Teardown

Using an SQLite in-memory database is a very useful technique, but it limits your application to the minimum features are common to both database engines. If you're not using an ORM you can maintain complete test coverage and take advantage of PostgreSQL features by spinning up a temporary database. In Python this can be automated in-process using testing.postgresql

import testing.postgresql

# Temporary Database

def init_postgres():
    postgresql = testing.postgresql.Postgresql()
    print "postgresql up and running at %s" % postgresql.url()
    return postgresql

def setup_static_fetch(postgresql):
    subprocess.check_output(['psql', postgresql.url(), '-q', '-f', 'schema.sql'])

# Initialize App

postgresql = init_postgres()
setup_static_fetch(postgresql)

import app

Now we can start to write unit tests using this new ephemeral database. This is how one might set up a Tornado application

import unittest
import psycopg2
from tornado.options import options
from tornado.testing import AsyncHTTPTestCase, LogTrapTestCase

class TestUI(AsyncHTTPTestCase, LogTrapTestCase):

    @classmethod
    def setUpClass(cls):
        app.application.db = psycopg2.connect(options.dsn)
        app.application.db.set_isolation_level( \
                psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)

    @classmethod
    def tearDownClass(cls):
        if not app.application.db.closed:
            app.application.db.close()

    def get_app(self):
        return app.application

    def setUp(self):
        # use db connection from server to make trasactions are effective
        self.cur = app.application.db.cursor()
        self.cur.execute("BEGIN;")
        super(TestUI, self).setUp()

    def tearDown(self):
        self.cur.execute("ROLLBACK;")
        self.cur.close()
        super(TestUI, self).tearDown()

And our first test

    def test_list_urls(self):
        self.http_client.fetch(self.get_url("/"), self.stop)
        response = self.wait()
        self.assertEqual(response.code, 200)

if __name__ == '__main__':
    options.dsn = postgresql.url()
    unittest.main()

Time-Limited Database Instances

In Python there are some tricky edge cases that can easily hang the test runner if an uncaught exception occurs while running the unit tests. One solution is to spin up the database in a separate daemon:

#!/usr/bin/env python

import time
import daemon
import testing.postgresql

def wait(seconds=60):
    time.sleep(seconds)

if __name__ == "__main__":
    postgresql = testing.postgresql.Postgresql()
    print postgresql.url()
    with daemon.DaemonContext():
        wait()

With the help of a simple script, the Test::PostgreSQL Perl module can also be used to quickly spin up a database before running tests:

#!/usr/bin/env perl

use strict;
use warnings;

use Test::PostgreSQL;
use Proc::Daemon;

my $pgsql = Test::PostgreSQL->new();
print $pgsql->uri, "\n";

Proc::Daemon::Init;
sleep(30);

In both cases the library will automatically stop and remove the temporary database after a period of five minutes. Not only is this technique robust, but it can be used from any language, including a test runner written in shell

#!/bin/sh
url=$(./startpg.py)
psql -f schema.sql $url
# ...

Responsive Testing

It takes less then ten seconds to initialize a new database and start Postgres, but it would be huge boot to efficiency if it was available in less then one second. To accomplish this I published a utility called pg_tmp. This example uses ruby:

require 'pg'

url = %x( pg_tmp -t )
puts "Using #{url}"
conn = PG.connect(url)
result = conn.exec("SELECT now() AS number")
for row in result do
    puts row['number']
end

pg_tmp uses several tricks to reduce the wait time for a new database to less than a second. The temporary database will be garbage-collected asynchronously. By default pg_tmp returns the path to a Unix socket, but an unused TCP port can be selected instead using the -t option. This is nessesary for Java since Unix sockets are not supported.

Mocking Time

Shadow built-in now() function to return the real clock timestamp

CREATE SCHEMA unit_tests;
CREATE OR REPLACE FUNCTION unit_tests.now() RETURNS timestamp AS $$
BEGIN
  RETURN clock_timestamp();
END;
$$ LANGUAGE plpgsql;

Or to fix the time

  RETURN timestamp '2021-06-06 12:00:00';

To make this the default for an ephemeral database for the test user

ALTER USER CURRENT_USER SET search_path = unit_tests, public, pg_catalog;