Eric Radman : a Journal

Defensive Techniques for Unrully Database Clients

Setting the Application Name

One of the most helpful techniques in diagnosing aberrant behavior is to know who is connecting

pg=> select client_addr,now()-backend_start as backend_start_age,
      state,application_name from pg_stat_activity order by xact_start;;
  client_addr  |   backend_start_age    |        state        | application_name
---------------+------------------------+---------------------+------------------  | 00:14:41.518607        | active              | psql
 (1 rows)

The source address is useufl, but it's even better if you can identify the specific application. Using the URI format the application_name can be set as an optional parameter

$ psql postgresql://user@localhost/testdb?application_name=rad_app

From a package like SQLAlchemy this is more obtuse. The trick is to set the connect_args key in the connection parameters

import os, sys, pwd, socket
import sqlalchemy

prog = os.path.basename(sys.argv[0]) or 'rad_app'
username = pwd.getpwuid (os.getuid ()).pw_name
hostname = socket.gethostname().split(".")[0]
connection_params = {databasebase="test", user="test2"}
connection_params['connect_args'] = {'application_name':
    "%s:%s@%s" % (prog, username, hostname)}
engine = sqlalchemy.create_engine(connection_params)

The extra work of figuring out the program name, username and hostname not only gives us the added capability of responding programmatically since it can be parsed. Under normal circumstances it is impossible to limit connections by connecting user, but now I can. The following query can be run periodically to terminate any connection from any individual user that insists on holding more than 100 database connections

\set connections_per_user 100
WITH activity AS (
    pid, state, client_addr, application_name,
    greatest(state_change, query_start, xact_start) AS last_activity,
      (regexp_split_to_array(application_name, E'[:@]'))[1]
      ORDER BY greatest(state_change, query_start, xact_start)),
    regexp_split_to_array(application_name, E'[:@]') AS parts
  FROM pg_stat_activity
    pg_terminate_backend(pid) AS X,
    client_addr, state, application_name,
    date_trunc('second', last_activity) AS last_activity,
    RANK() OVER(PARTITION BY parts[2] ORDER BY last_activity),
    parts[2] AS user
FROM activity
AND rank > :connections_per_user

Stopping Idle Transactions

Starting a transaciton (BEGIN) without terminating it (COMMIT or ABORT) has many serious implications. They block schema changes, they cannot be serialized using a proxy such as pgbouncer and they prevent autovaccum from running. When using pgbouncer an upper limit can be set in the config


Out of band these can be terminated using a periodic cron job

SELECT usename, client_addr, now() - xact_start, application_name,
FROM pg_stat_activity
WHERE state='idle in transaction'
AND xact_start < now() - interval '1 hour'

Why would an programmer write software that leaves transactions open in the first place? This is usually the fault of APIs that create dangerous implicit behavior. The Python DBAPI mandates that transactions be opened implicitly by default

Warning By default, any query execution, including a simple SELECT will start a transaction: for long-running programs, if no further action is taken, the session will remain "idle in transaction", an undesirable condition... — Psycopg autocommit

It is also possible to sanitize psycopg2 acitivity in a PG proxy. This is an extreme measure of course, but under demanding conditions this option is available.

Closing Connections

Once again Python DB libraries provide wrong behavior. The following code appears to open and close a connection

with psycopg2.connect(uri) as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT 1")

But only the cursor is closed; the connection itself is left open. The correct way to write this to close connections explicitly.

conn = psycopg2.connect(uri)
with conn.cursor() as cur:
    cur.execute("SELECT 1")

To release a connection with SQLAlchemy disable connection pooling.

from sqlalchemy.pool import NullPool
engine = create_engine(uri, poolclass=NullPool)

The effects of connection pooling are purely negative since this uses extra resources and there is no way to to automatically recover from a disconnect.

Last updated on March 08, 2017