Eric Radman : a Journal

Developing PostgreSQL Queries

Unit testing SQL does not seem to be a popular discipline, but I think complex queries have a lot to gain from corroboration through automated testing. As in testing any other kind of code, unit tests help separate concerns, and facilitate decomposition such that the programmer can more easily think through the process for solving a hard problem.

SQL is largely a declarative language for relational data, and as such the programmer is first concerned with telling the database engine what to do with very attention to how the internal execution will occur. Once a query produces the correct result it often must be adjusted so as to provide better performance. This kind of refactoring becomes much less perilous once a set of tests are in place to prove that a new approach delivers the same result.

A Simple Test Runner

At times you may choose not to implement a separate test runner in shell, but a simple wrapper script can provide valuable options for debugging. One of my favorite options to watch for a keywords such as shell which will drop me into the UI for the database before destroying it. This allows me to experiment with the test functions and sample data.

set -e
createdb $DB
cat {stage,query,verify}.sql | psql -tAq $DB 2>&1 | grep -v "^NOTICE"
[[ "$1" == 'shell' ]] && psql $DB
dropdb $DB

This runner assembles a test in three parts: stage.sql sets up utility functions, tables, and sample data, query.sql creates the views to be tested, and verify.sql runs the queries and compares results.

Test Functions

Postgres is nice enough to provide polymorphic types which can be used to create a generic comparison function.

CREATE FUNCTION assert(expected anyelement, got anyelement) RETURNS SETOF void
AS $$
  IF got!=expected THEN
    RAISE EXCEPTION 'got % results, expected %', got, expected;
$$ LANGUAGE plpgsql;

This simple comparison function expects both arguments to be of the same type, and it won't automatically promote types on your behalf.

-- incorrect: count() returns a bigint
SELECT assert(0, (SELECT count(*) FROM machines));
-- correct
SELECT assert(0, (SELECT count(*) FROM machines)::integer);

The SETOF keyword prevents the function from returning a blank record, instead it returns a zero-length list.

Populating Samle Data

INSERT statements can become very verbose. If PostgreSQL is the only target, COPY tends to be more compact and readable since tabs are the standard delimiter.

COPY machines (name, ram) FROM stdin;
computenode1   16
computenode2   16
computenode3   96
computenode4   96

In traditional TDD a minimal fresh fixture is constructed to only supply enough data to satisfy the unit test. The same practice works well when verifying database design as well. The sample data should not include every column the production database contains, only the columns required by the views or queries that are being exersised.

Query Under Test

Because we split the test cycle into three different phases (stage, query, verify), we have the opportunity to create a query file that will be used without alteration to configure the production database. A simple view might read as follows:

    SELECT name FROM machines
    WHERE ram > 16;

Testing for Expected Behavior

Now that we have sample data we can create a new table (prefixed with t_ for test or temporary) that should mirror the expected results of the views under test.

CREATE TEMPORARY TABLE t_himem_machines (
    name character varying(32)

COPY machines (t_himem_machines) FROM stdin;

Now that we have sample data, a query, and an expected result we can write a view that shows the descrepency between the actual and expected results before the exception is raised

    SELECT '+', * FROM (
      SELECT * FROM himem_machines EXCEPT SELECT * FROM sample_machines
    ) AS T1
    SELECT '-', * FROM (
      SELECT * FROM himem_machines EXCEPT SELECT * FROM sample_machines
    ) AS T2;

    SELECT * FROM machine_diff;
    SELECT assert(0, (SELECT count(*) FROM machine_diff)::integer);
    SELECT 'SQL: rerun_candidates ..ok';

Unit Test Schemas

Schemas in Postgres can be used to overlay functionality, and even make stubbing system functions possible.

CREATE ROLE unit_tester ROLE eradman;
CREATE SCHEMA unit_tests AUTHORIZATION unit_tester;
SET search_path TO unit_tests, "$user", public, pg_catalog;

pg_catalog is normally the first element of the search path, but by adding it explicitly you can override built-in functions. There's no concept of a global variable as such, but we can adjust the behavior of our mock functions by updating a parameters table under the unit_tests schema.

-- Mask now() in pg_catalog by creating one in the unit_tests schema
    RETURN timestamp '2011-06-11';
$$ LANGUAGE plpgsql;

Last updated on August 29, 2019