Overcoming First Principles : A guide for accessing the features of PostgreSQL in test-driven development

Abstract

While integration tests and system tests are valuable for catching regressions introduced by a commit, it is fast, self-contained unit tests that provide the tight feedback loop which makes test-driven development possible.

Effective unit testing requires a software designer to define a set of abstractions which will comprise a particular system. If unit tests become brittle obstacles to refactoring or ineffective in providing design insight, then the approach we use for testing must be revised.

In this talk we will explore methods of treating PostgreSQL as a runtime component of your application and not as an external service. The end goal will be to learn to write tests that neatly capture units of functionality which depend on an SQL engine.

Preamble

In Richard Hipp's 2014 keynote address in Ottawa, he contrasted SQLite with PostgreSQL. PG is "Client/Server", SQLite is "Serverless" and son on... In the conclusion of his talk he made a compact recommendation that describes the perspective you need in order to use Postgres effectively:

Think of PostgreSQL as a programming language

This is astute observation. When used as a programing environment Postgres becomes part of your application runtime, and as such, applications written to take advantage of this runtime require an approach to unit testing that verifies this functionality at the right level.

Everything we cover here will be applicable to system tests and integration tests, but we will focus on unit tests in particular.

Unit testing is a software engineering discipline based on a premise put forth by Kent Beck which states that difficulty testing is a symptom of poor design. Kent's articulation of this principle can be summed up like this:

Design that is conducive to testing also depends on structural components that are at once easy to comprehend, easy to use in different contexts, and easy to modify.

For most programmers that I know, the constraints of unit testing are perplexing. But I have found that non-programmers sometimes get this if I explain it in this way:

The best way to build software that can be adapted and maintained is to build software that is testable. One way to make testable software is to write tests first.

This sounds like a formula, but you, the programmer are still making decisions. One of the significant challenges in this process is to find what Robert Martin identifies as architecturally significant boundaries [1].

With this in mind, let's think through some patterns for the design of systems that leverage a database.

Unit tests cover a unit of functionality. To use the metaphor of human language, the goal is not to identify "sentence fragments", but "whole paragraphs" that represent a coherent operation. Finding the right level to zoom in to is about finding a point of reference that take into account the logic of the problem you are trying to solve.

In order to communicate the unique role that unit tests play, people have devised rules for describing what they can and cannot do.

Here are a typical set of rules. Let's try to state these axioms positively (in reverse order):

(5) Self-hosting test harness. git clone; make test is all you have to do to run the tests

(4) Implies dynamic allocation port numbers and other global or shared resources

(3) Tests run independently of the contents of the host file system by emulating system calls such as open(3), write(3) and seek(3), and so on

(2) Tests run locally, hence they are indifferent to the state of the network.

(1) What is the motivation for prohibiting database access?

These characteristics all generally true of commercial databases. Parenthetically I would also add that as such commercial databases are hostile to software engineering because they do not facilitate fast repeatable tests.

None of these problems are difficult to overcome with open-source software. Solutions:

Use a startup procedure optimized for low latency
Dynamically allocate global resources and generate configurations at runtime

Because Postgres is designed to scale up and not down it does put significant demands on the test harness:

Need to initialize state on a file system
We have to establish client-server access by communicating a URL
The multiprocess architecture of PG holds shared memory

SQLite imposes none of these complexities, will SQLite do?

Treating SQLite as a replacement for PG in your application is tempting: it's fast and doesn't require disk access...and it might be correct representation of what happens in production.

The ability to use SQLite as a "fake" instance of PG is a useful option. But, giving up on functionality that is useful to your application is a stiff tradeoff.

Will SQLite do? As a place to start, yes. The temptation to use SQLite for running tests is a very good, because you are already starting to define the scope of your tests by teaching your application to take advantage of the database.

How many of you know what the "PL" stands for in PL/pgSQL? No, it's not Programming Language, it's Procedural Language/PostgreSQL. That is in contrast with standard SQL, which is a Declarative Language.

If SQL is a language how do you write tests for an application that interacts with a database?

Here's what you do: encapsulate all access to your database through an implementation of an interface called MyDatabaseService (or whatever). ... The mock will have no logic: it will simply return the data required -- James Sadler, September 2005

My interpretation of this advice is this: reduce SQL to the smallest set of operators and logical constructs that you can live with and then test each code fragment.

Could you test an application that used embedded JavaScript this way? Perhaps, but aside from the very painful process of emulating bits of the JavaScript VM you would be creating poor tests for one very important reason:

You are limiting the use of JavaScript in your application to the means of combination that you were able or willing to mimic in your test framework.

Remember the this thesis: difficulty testing is a symptom of poor design. If you are having such trouble testing then you are probably doing damage to your production code as well. So what is the alternative?

"Mock High" is a term I learned from Ryan Davis, it means you only use mocks to detach from unreliable resources.

If you do this, you have adopted what I would call Coextensive Unit Testing

A unit test is commensurate with individual functional units. Hence, the 5 rules I started with could be redrawn like this:

Covers all data manipulation logic
Runs on a host with or without networking
Is agnostic to file system layout
Can be run without requiring locks on specific resources
Is portable, deterministic, and responsive

The subtext to what I am calling Coextensive Unit Testing is that you do not normally add mock for speed[1][2]. If you have slow tests you should figure out why there are slow. In 2005 Curt Sampson wrote:

Sure, one approach is not to do those tests. But another is to figure out how you can ensure the availability of that resource...A few years ago it even required moving an entire project from Oracle to PostgreSQL, so that everybody could have their own database server, with as many schemas as they wanted...not only did the database stop being a "critical resource" that was blocking testing, but it in fact [this] brought [us] into the agile world, letting us make database changes almost as easily as other code changes.[3]

This method enables us to write tests that are more comprehensive and less brittle. We gained the full power of the database and we can freely shift logic to the database runtime if that makes sense for a given task. To this end I created ephemeralpg.org

The notion behind ephemeralpg.org is that writing unit tests for an application that uses PostgreSQL is not inherently difficult. The barrier to writing these tests is that it's not immediately apparent how to make construct the test harness. I'll switch to a terminal to show basic invocation:

url=`pg_tmp -t`
echo $url
psql $url
> SELECT txid_current()

[back to slides]: A temporary database created with pg_tmp has a limited shared memory footprint and is automatically garbage-collected after the number of seconds specified by the -w option (the default is 60).

Anyone can work out a pattern for using initdb and pg_ctl to spin up a temporary database, but a naive approach does not result in a very responsive startup times. pg_tmp is an off-the-shelf script tries to get this as right as possible, and is tuned to spin up new instances in 1/10th of a second.

It starts PG on a socket by default, or allocates a TCP port if you ask it to using small utility called getsocket(1).

To see how this works in practice, let's finish building a tiny HTTP server that can be registered as a "webhook" on Bitbucket or Github.

The purpose of this little program will be to allow a port maintainer to be alerted whenever a new release is cut. When a commit is made containing a new tag, this little web service will stash the JSON playload send by Bitbucket or Github and record four elements:

The current time
The name of the service (github or bitbucket)
The name of the repository (user/repo)
The name of the tag (project-version)

Once these values are recorded it simple enough to take some other action such as sending an e-mail with a link to the new software. We'll start by looking at what kind of data will be thrown at us.

flattenjs is a simple tool that I assembled for transforming nested structures into Postgres-style json-path selectors:

cat bitbucket-tag.json | flattenjs color=on | grep tag
cat bitbucket-tag.json | flattenjs color=on | grep repository

.tmux.conf

bind-key C-t resize-pane -y 10 -t .1
bind-key C-r source-file ~/.tmux.conf

Commands

hg checkout -C demo
vim -p Makefile *.rb *.sql

ls Makefile *.rb *.sql | entr -c sh -c 'make || tmux resize-pane -y 20 -t .1'

I suppose there are times when it make sense to put rules for data integrity in the application, but there are a number of advantages to putting this logic in the database: 1) The database can guarantee constant behavior across multiple applications. 2) We can use a function we create as a means of updating existing records if we decide to change our schema. 3) Better transactional semantics for multi-threaded applications

The small HTTP service that we have been evaluating contains only one table, but most projects will incorporate referential data which is very much integral to the schema.

If you run unit tests against a real database, then you probably have a complete or nearly complete set of initialization fixtures that enable joins to schema-related structures. Initialize the database using SQL and you will very easily accommodate custom types, functions, triggers or even custom extensions.

These same fixtures should comprise nearly everything you need to spin up a database that will interact with external system tests. The only additional step is to add some minimal information about runtime environment.

There are at least two means of resetting state in-between each test: ROLLBACK or TRUNCATE.

Unfortunately using BEGIN and ROLLBACK in the test runner will only work if the system under test does not itself run transactions. BEGIN TRANSACTION may be executed multiple times, but this is not nesting; COMMIT or ABORT will finalize or abandon everything. I'm open to ideas.

In my view the best way to clean up after each test is to run TRUNCATE. It's simple, fast, and it's not hard to drop other sorts of objects if need be.

(pg_create_restore_point() is close to the right idea, but restarting the server and running a recovery is expensive)

Most record-keeping includes implicit environmental data as well as direct inputs. Perhaps the most common input from the runtime environment is the current date and time. In Postgres overriding a any function is as easy as defining a new function and then setting the search_path to resolve your custom functions first.

There is also a schema called pg_temp, which local to each database connection. I haven't used it, but this must be useful in some cases.

Method	Backend	Execution Speed
Django ORM	sqlite	25 tests in 16.48 seconds
SQL	postgresql	29 tests in 3.81 seconds

What about SQLite? Do we need to use SQLite to build responsive tests? No!

It is difficult to draw a fair performance comparison between projects that use different methods and do different things, but I collected some benchmarks from in-house applications that I think provide some useful insight. (my apologies... I would unable to find a pair of open source projects that were similar enough to provide a useful comparison)

These numbers are from two applications that I maintain which use the same database in production. They are both written in Python and use the pyscopg2 database adapter. These numbers include Postgres startup time!

7.6 tests/seconds versus 1.5 tests/second (5x speedup)
456 tests/minute versus 90 tests/minute

It is possible to become overly obsessed with the speed of tests, but there is steep price to pay for tests that run longer than 30 seconds. After 2 or 3 minutes test of any sort are only useful to catch regressions. Regression tests are valuable, but if they take minutes or 10s of minutes then you can no longer use them for test-driven design.

It is my experience [I have hard this from others as well] that there is a direct correlation between time to deliver a new feature and the speed of your tests. I think this link exists because excess delay encourages multi-tasking. Checking e-mails and switching tasks causes you to loose state on the task.

Here is the good news: you do not have to choose between rapid feedback and testing against a real database. Load schemas off of disk, build your application around SQL and it is going to perform very well.

There are two ways of describing the role of a Database Administrator or Relational Data Architect. The first is by the things we do. The tasks or roles listed above are not comprehensive, and they are not a description of anyone in particular. But I hope they are representative of nearly everyone in one way--these are things we do to pave a path. This is a supportive role.

The headquarters for the company I work for is located on 45th St. in Manhattan. If I arrive there by way of Port Authority then I usually walk right through Times Square where marketing consultants continuously launch a barrage of messages--marketing packets intended to hit every person on the street.

One might think that nothing in Times Square could be offensive since the environment was not designed for anyone in particular. Yet I cannot help but feel misunderstood. You see, all of the demographic data in the world cannot see beyond the tasks of each day to the things we are about.

Running automated tests on an ephemeral database reduces impediments to testing and giving developers the ability to explore solutions enabled by the database.

Rich Hicky aptly described a database as something that provides leverage over data. Key-value stores, file systems, and other facilities with an API are all useful, but it is the programmatic capabilities of a database that make it pivotal to an application ecosystem. By making it easy for your team to employ the full set of features available in Postgres you are enabling your team to simplify an application stack.

In this session we started with the proposition that difficulty in testing is a symptom of poor design. This axiom is a statement about how software engineering works. A meaningful test strategy provides feedback on architecturally significant boundaries.

I will close with an illustration from another kind of creative endeavor. In 1998 I picked up my first point-and-shoot camera, a very clumsy device by Kodak. In 1999 I bought the newly released Nikon Coolpix 950. I figured that photography requires some mastery of lighting, so I practiced with a TTL flash, studio umbrellas, and reflectors. I carried this camera everywhere over the next six years, but made very little progress in capturing memorable images. Very little progress that is, until I picked up a Canon Elan 7 along with an inexpensive 50mm lens. Within weeks I was taking photos that I am pleased to display to this day.

Does this short story make sense? Can changing gear compensate for mediocre skill? No, of course not. But in large part the intelligence we apply to an endeavor is in the selection of processes that will in turn form us. We create tools, and these tools in turn provide the vantage point for the systems we design, and ultimately the shape of our own imagination.

sqlite/3.8	315 tests in 14.45 seconds
postgresql/9.4	315 tests in 36.82 seconds

1.0	January	2015	Background database initialization. All major platforms supported
1.3	October	2015	Allow multiple major versions of PostgreSQL to be used
1.6	December	2015	Allow postgres(1) extra-options to be specified

Overcoming First Principles

Axiomatic Unit Testing

the database

Will SQLite Do?

The Testing Focal Plane

Coextensive Unit Testing

ephemeralpg.org

Demo: tagwatch

Patterns for System Tests

Options for Resetting State

Freezing Expectations

Tuning the Critical Feeback Loop

To be About

Conclusion