A guide for accessing the features of PostgreSQL in test-driven development
PGONF 2016, New York City
Abstract
While integration tests and system tests are valuable for catching regressions introduced by a commit, it is fast, self-contained unit tests that provide the tight feedback loop which makes test-driven development possible.
Effective unit testing requires a software designer to define a set of abstractions which will comprise a particular system. If unit tests become brittle obstacles to refactoring or ineffective in providing design insight, then the approach we use for testing must be revised.
In this talk we will explore methods of treating PostgreSQL as a runtime component of your application and not as an external service. The end goal will be to learn to write tests that neatly capture units of functionality which depend on an SQL engine.
Preamble
In Richard Hipp's 2014 keynote address in Ottawa, he contrasted SQLite with PostgreSQL. PG is "Client/Server", SQLite is "Serverless" and son on... In the conclusion of his talk he made a compact recommendation that describes the perspective you need in order to use Postgres effectively:
Think of PostgreSQL as a programming language
This is astute observation. When used as a programing environment Postgres becomes part of your application runtime, and as such, applications written to take advantage of this runtime require an approach to unit testing that verifies this functionality at the right level.
Everything we cover here will be applicable to system tests and integration tests, but we will focus on unit tests in particular.
Unit testing is a software engineering discipline based on a premise put forth by Kent Beck which states that difficulty testing is a symptom of poor design. Kent's articulation of this principle can be summed up like this:
Design that is conducive to testing also depends on structural components that are at once easy to comprehend, easy to use in different contexts, and easy to modify.
For most programmers that I know, the constraints of unit testing are perplexing. But I have found that non-programmers sometimes get this if I explain it in this way:
The best way to build software that can be adapted and maintained is to build software that is testable. One way to make testable software is to write tests first.
This sounds like a formula, but you, the programmer are still making decisions. One of the significant challenges in this process is to find what Robert Martin identifies as architecturally significant boundaries [1].
With this in mind, let's think through some patterns for the design of systems that leverage a database.
A test is not a unit test ifMichael Feathers, September 2005 "Don't let slow tests bog you down"
- It talks to the database
- It communicates across the network
- It touches the file system
- It can't run at the same time as any of your other unit tests
- You have to do special things to your environment
Unit tests cover a unit of functionality. To use the metaphor of human language, the goal is not to identify "sentence fragments", but "whole paragraphs" that represent a coherent operation. Finding the right level to zoom in to is about finding a point of reference that take into account the logic of the problem you are trying to solve.
In order to communicate the unique role that unit tests play, people have devised rules for describing what they can and cannot do.
Here are a typical set of rules. Let's try to state these axioms positively (in reverse order):
(5) Self-hosting test harness. git clone; make test
is all you have
to do to run the tests
(4) Implies dynamic allocation port numbers and other global or shared resources
(3) Tests run independently of the contents of the host file system by
emulating system calls such as open(3)
, write(3)
and
seek(3)
, and so on
(2) Tests run locally, hence they are indifferent to the state of the network.
(1) What is the motivation for prohibiting database access?
the database
Generally true of commercial databases
These characteristics all generally true of commercial databases. Parenthetically I would also add that as such commercial databases are hostile to software engineering because they do not facilitate fast repeatable tests.
None of these problems are difficult to overcome with open-source software. Solutions:
Because Postgres is designed to scale up and not down it does put significant demands on the test harness:
SQLite imposes none of these complexities, will SQLite do?
cd ~/src/Django-1.8.3 PYTHONPATH=. python3.4 tests/runtests.py \ --settings=test_sqlite migrations
sqlite/3.8 | 315 tests in 14.45 seconds |
postgresql/9.4 | 315 tests in 36.82 seconds |
Benefit: easy setup, fast
Cost: give up the advanced features of PGDB
Treating SQLite as a replacement for PG in your application is tempting: it's fast and doesn't require disk access...and it might be correct representation of what happens in production.
The ability to use SQLite as a "fake" instance of PG is a useful option. But, giving up on functionality that is useful to your application is a stiff tradeoff.
Will SQLite do? As a place to start, yes. The temptation to use SQLite for running tests is a very good, because you are already starting to define the scope of your tests by teaching your application to take advantage of the database.
You can mock and API but you cannot mock a language
SQL is a declarative language
Mock high
How many of you know what the "PL" stands for in PL/pgSQL? No, it's not Programming Language, it's Procedural Language/PostgreSQL. That is in contrast with standard SQL, which is a Declarative Language.
If SQL is a language how do you write tests for an application that interacts with a database?
Here's what you do: encapsulate all access to your database through an implementation of an interface called MyDatabaseService (or whatever). ... The mock will have no logic: it will simply return the data required -- James Sadler, September 2005
My interpretation of this advice is this: reduce SQL to the smallest set of operators and logical constructs that you can live with and then test each code fragment.
Could you test an application that used embedded JavaScript this way? Perhaps, but aside from the very painful process of emulating bits of the JavaScript VM you would be creating poor tests for one very important reason:
You are limiting the use of JavaScript in your application to the means of combination that you were able or willing to mimic in your test framework.
Remember the this thesis: difficulty testing is a symptom of poor design. If you are having such trouble testing then you are probably doing damage to your production code as well. So what is the alternative?
"Mock High" is a term I learned from Ryan Davis, it means you only use mocks to detach from unreliable resources.
If you do this, you have adopted what I would call Coextensive Unit Testing
Responsive test cases which
Cover all essential behaviors of a specific functional unit
Rely on the test harness to establish runtime integration
A unit test is commensurate with individual functional units. Hence, the 5 rules I started with could be redrawn like this:
The subtext to what I am calling Coextensive Unit Testing is that you do not normally add mock for speed[1][2]. If you have slow tests you should figure out why there are slow. In 2005 Curt Sampson wrote:
Sure, one approach is not to do those tests. But another is to figure out how you can ensure the availability of that resource...A few years ago it even required moving an entire project from Oracle to PostgreSQL, so that everybody could have their own database server, with as many schemas as they wanted...not only did the database stop being a "critical resource" that was blocking testing, but it in fact [this] brought [us] into the agile world, letting us make database changes almost as easily as other code changes.[3]
This method enables us to write tests that are more comprehensive and less brittle. We gained the full power of the database and we can freely shift logic to the database runtime if that makes sense for a given task. To this end I created ephemeralpg.org
pg_tmp(1): Run tests on an isolated, temporary PostgreSQL database
1.0 | January | 2015 | Background database initialization. All major platforms supported |
1.3 | October | 2015 | Allow multiple major versions of PostgreSQL to be used |
1.6 | December | 2015 | Allow postgres(1) extra-options to be specified |
The notion behind ephemeralpg.org is that writing unit tests for an application that uses PostgreSQL is not inherently difficult. The barrier to writing these tests is that it's not immediately apparent how to make construct the test harness. I'll switch to a terminal to show basic invocation:
url=`pg_tmp -t`
echo $url
psql $url
> SELECT txid_current()
[back to slides]: A temporary database created with pg_tmp
has a
limited shared memory footprint and is automatically garbage-collected
after the number of seconds specified by the -w
option (the default
is 60
).
Anyone can work out a pattern for using initdb
and pg_ctl
to spin
up a temporary database, but a naive approach does not result in a
very responsive startup times. pg_tmp
is an off-the-shelf script
tries to get this as right as possible, and is tuned to spin up new
instances in 1/10th of a second.
It starts PG on a socket by default, or allocates a TCP port if you
ask it to using small utility called getsocket(1)
.
Low friction to pushing data integrity rules to the database
New features in PostgreSQL are readily available to application developers
Speedy test harness facilitates test-driven development
To see how this works in practice, let's finish building a tiny HTTP server that can be registered as a "webhook" on Bitbucket or Github.
The purpose of this little program will be to allow a port maintainer to be alerted whenever a new release is cut. When a commit is made containing a new tag, this little web service will stash the JSON playload send by Bitbucket or Github and record four elements:
Once these values are recorded it simple enough to take some other action such as sending an e-mail with a link to the new software. We'll start by looking at what kind of data will be thrown at us.
flattenjs
is a simple tool that I assembled for transforming nested
structures into Postgres-style json-path selectors:
cat bitbucket-tag.json | flattenjs color=on | grep tag
cat bitbucket-tag.json | flattenjs color=on | grep repository
.tmux.conf
bind-key C-t resize-pane -y 10 -t .1
bind-key C-r source-file ~/.tmux.conf
Commands
hg checkout -C demo
vim -p Makefile *.rb *.sql
ls Makefile *.rb *.sql | entr -c sh -c 'make || tmux resize-pane -y 20 -t .1'
I suppose there are times when it make sense to put rules for data
integrity in the application, but there are a number of advantages to
putting this logic in the database: 1) The database can guarantee
constant behavior across multiple applications. 2) We can use a function
we create as a means of updating existing records if we decide to change
our schema. 3) Better transactional semantics for multi-threaded
applications
Use initialization fixtures plus current state from the environment
Propagate database connection parameters through an environment variable
require 'pg' dburl = ENV['DATABASE_URL'] || raise("DATABASE_URL not set") conn = PG.connect(dburl)
The small HTTP service that we have been evaluating contains only one table, but most projects will incorporate referential data which is very much integral to the schema.
If you run unit tests against a real database, then you probably have a complete or nearly complete set of initialization fixtures that enable joins to schema-related structures. Initialize the database using SQL and you will very easily accommodate custom types, functions, triggers or even custom extensions.
These same fixtures should comprise nearly everything you need to spin up a database that will interact with external system tests. The only additional step is to add some minimal information about runtime environment.
Revert Changes
def setup pg.exec "BEGIN;" end def teardown pg.exec "ROLLBACK;" end
Wipe Data
def setup pg.exec "TRUNCATE tableA, tableB, ...;" end
There are at least two means of resetting state in-between each test:
ROLLBACK
or TRUNCATE
.
Unfortunately using BEGIN
and ROLLBACK
in the test runner will
only work if the system under test does not itself run transactions.
BEGIN TRANSACTION
may be executed multiple times, but this is not
nesting; COMMIT
or ABORT
will finalize or abandon everything. I'm
open to ideas.
In my view the best way to clean up after each test is to run
TRUNCATE
. It's simple, fast, and it's not hard to drop other sorts
of objects if need be.
(pg_create_restore_point()
is close to the right idea, but
restarting the server and running a recovery is expensive)
Test Database Initialization
ALTER USER ... SET search_path TO test, public, pg_catalog;
Per-Connection Setup
SET search_path TO test, public, pg_catalog;
Most record-keeping includes implicit environmental data as well as
direct inputs. Perhaps the most common input from the runtime
environment is the current date and time. In Postgres overriding a
any function is as easy as defining a new function and then setting
the search_path
to resolve your custom functions first.
There is also a schema called pg_temp
, which local to each database
connection. I haven't used it, but this must be useful in some cases.
Raw
SQL generally performs very well
Method | Backend | Execution Speed |
---|---|---|
Django ORM | sqlite | 25 tests in 16.48 seconds |
SQL | postgresql | 29 tests in 3.81 seconds |
(Numbers are from two similar applications that access the same tables)
What about SQLite? Do we need to use SQLite to build responsive tests? No!
It is difficult to draw a fair performance comparison between projects that use different methods and do different things, but I collected some benchmarks from in-house applications that I think provide some useful insight. (my apologies... I would unable to find a pair of open source projects that were similar enough to provide a useful comparison)
These numbers are from two applications that I maintain which use the same database in production. They are both written in Python and use the pyscopg2 database adapter. These numbers include Postgres startup time!
It is possible to become overly obsessed with the speed of tests, but there is steep price to pay for tests that run longer than 30 seconds. After 2 or 3 minutes test of any sort are only useful to catch regressions. Regression tests are valuable, but if they take minutes or 10s of minutes then you can no longer use them for test-driven design.
It is my experience [I have hard this from others as well] that there is a direct correlation between time to deliver a new feature and the speed of your tests. I think this link exists because excess delay encourages multi-tasking. Checking e-mails and switching tasks causes you to loose state on the task.
Here is the good news: you do not have to choose between rapid feedback and testing against a real database. Load schemas off of disk, build your application around SQL and it is going to perform very well.
Tasks we engage in:
There are two ways of describing the role of a Database Administrator or Relational Data Architect. The first is by the things we do. The tasks or roles listed above are not comprehensive, and they are not a description of anyone in particular. But I hope they are representative of nearly everyone in one way--these are things we do to pave a path. This is a supportive role.
The headquarters for the company I work for is located on 45th St. in Manhattan. If I arrive there by way of Port Authority then I usually walk right through Times Square where marketing consultants continuously launch a barrage of messages--marketing packets intended to hit every person on the street.
One might think that nothing in Times Square could be offensive since the environment was not designed for anyone in particular. Yet I cannot help but feel misunderstood. You see, all of the demographic data in the world cannot see beyond the tasks of each day to the things we are about.
Running automated tests on an ephemeral database reduces impediments to testing and giving developers the ability to explore solutions enabled by the database.
Use the advanced features of PostgresSQL to simplify your application stack
Software engineering is largely a discipline of testing methodologies
The tools we avail ourselves of form the vantage point for the systems we design
Rich Hicky aptly described a database as something that provides leverage over data. Key-value stores, file systems, and other facilities with an API are all useful, but it is the programmatic capabilities of a database that make it pivotal to an application ecosystem. By making it easy for your team to employ the full set of features available in Postgres you are enabling your team to simplify an application stack.
In this session we started with the proposition that difficulty in testing is a symptom of poor design. This axiom is a statement about how software engineering works. A meaningful test strategy provides feedback on architecturally significant boundaries.
I will close with an illustration from another kind of creative endeavor. In 1998 I picked up my first point-and-shoot camera, a very clumsy device by Kodak. In 1999 I bought the newly released Nikon Coolpix 950. I figured that photography requires some mastery of lighting, so I practiced with a TTL flash, studio umbrellas, and reflectors. I carried this camera everywhere over the next six years, but made very little progress in capturing memorable images. Very little progress that is, until I picked up a Canon Elan 7 along with an inexpensive 50mm lens. Within weeks I was taking photos that I am pleased to display to this day.
Does this short story make sense? Can changing gear compensate for mediocre skill? No, of course not. But in large part the intelligence we apply to an endeavor is in the selection of processes that will in turn form us. We create tools, and these tools in turn provide the vantage point for the systems we design, and ultimately the shape of our own imagination.