Eric Radman : a Journal

Using an Intermediate Database for Queries

I have been using PostgreSQL for more than 15 years, and over that span of time it has always delivered a set of features that provide what I need to solve operational challenges.

Sometimes the challenge is that a production server is not authorized to use some features. Here I will explore how to spin up a temporary database for running a query that depends on functionality the server warehousing the data does not have. PL/Python is a example of a very useful extension that may not be installed or authorized in a given environment. (It's an untrusted language, which means only superusers can define functions.)

Foreign Servers

postgres_fdw provides a way to create a foreign table. If you are given a connection URL we need to split it into components

#!/usr/local/bin/ruby

require 'uri'
uri = URI("postgresql://query:XXX@db1/localharvest")
components = [ :scheme, :user, :password, :host, :port, :path, :query, :fragment ]
components.each do |part|
    puts "#{part}=#{uri.send(part)}"
end

This can be fed into the rest of the script which creates the foreign servers

#!/bin/sh
eval `uri_to_pars.rb $0`

psql -q $tmp_url <<SQL
  CREATE SCHEMA localharvest_remote;

  CREATE EXTENSION postgres_fdw;
  CREATE SERVER localharvest_query FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host '${host}', dbname '${path##/}', fetch_size '30000');
  CREATE USER MAPPING FOR $USER SERVER localharvest_query
    OPTIONS (user '${user}', password '${password}');

  CREATE EXTENSION hstore;
  IMPORT FOREIGN SCHEMA public FROM SERVER localharvest_query INTO localharvest_remote;
SQL

Queries From the Foreign Schema

Now we have a personal database that we can install extensions on. The following makes python-hostlist accessible from within the database

CREATE EXTENSION plpythonu;

CREATE FUNCTION hostlist_collapse(names text[]) RETURNS text AS $$
import hostlist
return hostlist.collect_hostlist(names)
$$ LANGUAGE plpythonu;

CREATE FUNCTION hostlist_expand(names text) RETURNS text[] AS $$
import hostlist
return hostlist.expand_hostlist(names)
$$ LANGUAGE plpythonu;

Now we can run queries using these utility functions

SET search_path=localharvest_remote,public;

SELECT hostlist_collapse(hosts)
FROM hosts
FROM machines;

-- Stub time dependent functions for unit tests
--
CREATE SCHEMA unit_tests;
SET search_path TO unit_tests, "postgres", public, pg_catalog;

CREATE OR REPLACE FUNCTION now() RETURNS timestamp AS $$
BEGIN
    RETURN clock_timestamp();
END;
$$ LANGUAGE plpgsql;

We could use the default schema public, but using an alternate name space allows us to see what we've created. If we want more performance we can also create materialized views to store a snapshot of the data locally.