Eric Radman : a Journal

Database Test Isolation

Some applications access a database, but never perform updates—in this case writing non-interacting tests is easy enough: load a schema and sample data, then run queries against that data. Whenever data is modified we need a strategy for preventing test interaction. The options are:

  1. Wrap each test in a transaction
  2. Truncate tables and load fixtures before each test
  3. Create a copy of the database for each test
  4. Revert the database to a restorepoint after each test

Each of these approaches incurs a varying level of complexity. Options 1–3 provide responsive feedback; options 2–4 provide the full set of capabilities:

Overhead of less then 50ms for each test
Any data update is permissible
Full capability to exercise transactions
Can use session-level features
Connection pools are okay
External utilities invoked using a connection URI

Option 1: Begin, Rollback

PostgreSQL has a feature called savepoints, which at first to allow for nested transactions, but this is not the case. Not only is the syntax different, but any error causes the entire outer transaction to fail, and must be manually aborted.

As long as the system under test does not perform it's own transaction handling wrapping each test in a transaction should suffice

-- test 1

-- test 2

The challenge to writing a test runner is to ensure that the same database connection is used throughout all of the tests. For some components (such as external binaries) this is may be impossible.

Option 2: Truncate

Truncating tables and loading test data at the start of each test is a sensible approach, although it is difficult to estimate the overhead added to run each test. You may need to disable triggers and constraints while reloading the data

SET session_replication_role = 'replica';
TRUNCATE table_a;
TRUNCATE table_b;

SET session_replication_role = 'origin';

This approach may require a bit of extra work, such as refreshing materialized views or resetting sequences, but is still a clean solution since each test is not responsible for cleanup.

Option 3: Copy Database

In my view the most capable approach is to create a new database for each test, using the initial test schema and data as the template

-- create template
-- load schema and test data

-- test 1 (pg_url + '&dbname=test_1')

-- test 2 (pg_url + '&dbname=test_2')

The setup for each test is quick: about 30ms. Because you have a complete database there are very few restrictions.

Option 4: Point-in-Time Recovery

A fork of Postgres such as Zenith that supports "branching" provides a means of creating an isolated database using copy-on-write at the storage layer. It is also possible to do this using a file system that can restore to a snapshot.

The only time I would deep this approach is worthwhile is if test database is large.