Eric Radman : a Journal

Defensive Techniques for Unrully Database Clients

Setting the Application Name

One of the most helpful techniques in diagnosing aberrant behavior is to know who is connecting

pg=> select client_addr,now()-backend_start as backend_start_age,
      state,application_name from pg_stat_activity order by xact_start;;
  client_addr  |   backend_start_age    |        state        | application_name
---------------+------------------------+---------------------+------------------
 10.232.25.99  | 00:14:41.518607        | active              | psql
 (1 rows)

The source address is useufl, but it's even better if you can identify the specific application. Using the URI format the application_name can be set as an optional parameter

$ psql postgresql://user@localhost/testdb?application_name=rad_app

From a package like SQLAlchemy this is more obtuse. The trick is to set the connect_args key in the connection parameters

import os, sys, pwd, socket
import sqlalchemy

prog = os.path.basename(sys.argv[0]) or 'rad_app'
username = pwd.getpwuid (os.getuid ()).pw_name
hostname = socket.gethostname().split(".")[0]
connection_params = {databasebase="test", user="test2"}
connection_params['connect_args'] = {'application_name':
    "%s:%s@%s" % (prog, username, hostname)}
engine = sqlalchemy.create_engine(connection_params)

The extra work of figuring out the program name, username and hostname not only gives us the added capability of responding programmatically since it can be parsed. Under normal circumstances it is impossible to limit connections by connecting user, but now I can. The following query can be run periodically to terminate any connection from any individual user that insists on holding more than 100 database connections

\set connections_per_user 100
WITH activity AS (
  SELECT
    pid, state, client_addr, application_name,
    greatest(state_change, query_start, xact_start) AS last_activity,
    RANK() OVER(PARTITION BY
      (regexp_split_to_array(application_name, E'[:@]'))[1]
      ORDER BY greatest(state_change, query_start, xact_start)),
    regexp_split_to_array(application_name, E'[:@]') AS parts
  FROM pg_stat_activity
)
SELECT
    pg_terminate_backend(pid) AS X,
    client_addr, state, application_name,
    date_trunc('second', last_activity) AS last_activity,
    RANK() OVER(PARTITION BY parts[2] ORDER BY last_activity),
    parts[2] AS user
FROM activity
WHERE parts[2] IS NOT NULL
AND rank > :connections_per_user

Stopping Idle Transactions

Starting a transactions (BEGIN) without terminating it with COMMIT or ABORT has many serious implications. They block schema changes, they cannot be serialized using a proxy such as pgbouncer and they prevent autovaccum from running. When using pgbouncer an upper limit can be set in the config

idle_transaction_timeout=3600

Out of band these can be terminated using a periodic cron job

SELECT usename, client_addr, now() - xact_start, application_name,
    pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state='idle in transaction'
AND xact_start < now() - interval '1 hour'

Why would an programmer write software that leaves transactions open in the first place? This is the default behavior for psycopg2.

It is also possible to sanitize psycopg2 acitivity in a PG proxy. This is an extreme measure of course, but under demanding conditions this option is available.

Closing Connections

Once again Python DB libraries provide wrong behavior. The following code appears to open and close a connection

with psycopg2.connect(uri) as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT 1")

But only the cursor is closed; the connection itself is left open. The correct way to write this to close connections explicitly.

conn = psycopg2.connect(uri)
with conn.cursor() as cur:
    cur.execute("SELECT 1")
conn.close()

To release a connection with SQLAlchemy disable connection pooling.

from sqlalchemy.pool import NullPool
engine = create_engine(uri, poolclass=NullPool)

The effects of connection pooling are purely negative since this uses extra resources and there is no way to to automatically recover from a disconnect .