Using an Intermediate Database for Queries
I have been using PostgreSQL for more than 15 years, and over that span of time it has always delivered a set of features that provide what I need to solve operational challenges.
Sometimes the challenge is that a production server is not authorized to use
some features. Here I will explore how to spin up a temporary database for
running a query that depends on functionality the server warehousing the data
does not have.
PL/Python
is a example of a very useful extension that may not be installed or
authorized in a given environment. (It's an
untrusted
language, which means only superusers can define functions.)
Foreign Servers
postgres_fdw provides a way to create a foreign table. If you are given a connection URL we need to split it into components
#!/usr/local/bin/ruby require 'uri' uri = URI("postgresql://query:XXX@db1/localharvest") components = [ :scheme, :user, :password, :host, :port, :path, :query, :fragment ] components.each do |part| puts "#{part}=#{uri.send(part)}" end
This can be fed into the rest of the script which creates the foreign servers
#!/bin/sh eval `uri_to_pars.rb $0` psql -q $tmp_url <<SQL CREATE SCHEMA localharvest_remote; CREATE EXTENSION postgres_fdw; CREATE SERVER localharvest_query FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${host}', dbname '${path##/}', fetch_size '30000'); CREATE USER MAPPING FOR $USER SERVER localharvest_query OPTIONS (user '${user}', password '${password}'); CREATE EXTENSION hstore; IMPORT FOREIGN SCHEMA public FROM SERVER localharvest_query INTO localharvest_remote; SQL
Queries From the Foreign Schema
Now we have a personal database that we can install extensions on. The following makes python-hostlist accessible from within the database
CREATE EXTENSION plpythonu; CREATE FUNCTION hostlist_collapse(names text[]) RETURNS text AS $$ import hostlist return hostlist.collect_hostlist(names) $$ LANGUAGE plpythonu; CREATE FUNCTION hostlist_expand(names text) RETURNS text[] AS $$ import hostlist return hostlist.expand_hostlist(names) $$ LANGUAGE plpythonu;
Now we can run queries using these utility functions
SET search_path=localharvest_remote,public; SELECT hostlist_collapse(hosts) FROM hosts FROM machines; -- Stub time dependent functions for unit tests -- CREATE SCHEMA unit_tests; SET search_path TO unit_tests, "postgres", public, pg_catalog; CREATE OR REPLACE FUNCTION now() RETURNS timestamp AS $$ BEGIN RETURN clock_timestamp(); END; $$ LANGUAGE plpgsql;
We could use the default schema
public
,
but using an alternate name space allows us to see what we've created. If we
want more performance we can also create materialized views to store a
snapshot of the data locally.