%h1 Iterative Large Updates

%p
  Among the many strong features of a relational database is the capability to
  adjust existing of records

:codeblock
  :::sql
  UPDATE employees SET hired='2017-07-07' WHERE name='Kevin';

%p
  Even at a small scale there are some dangers. The
  %code WHERE
  clause is
  %a{:href=>"https://github.com/eradman/pg-safeupdate/"} not mandatory
  so it helps to try it in a transaction first. And you can see the results as
  well

:codeblock
  :::sql
  BEGIN;
  UPDATE employees SET hired='2017-07-07' WHERE name='Kevin'
  RETURNING employees.*;
  ROLLBACK;

%p
  Once content with the results change the last line to
  = succeed '.' do
    %code COMMIT

%h2 Suspend Side-Effects

%p
  If you are attempting to modify thousands or millions of rows, update triggers
  need to be accounted for. At the very least these triggers will cause bulk
  updates to run slowly. Fortunately it is possible to disarm these by setting

:codeblock
  :::sql
  SET session_replication_role = replica;

%p
  A word of caution: this also disables system triggers such as foreign key
  constraints. You also need to be running as a user that includes the role
  %em Superuser.

%h2 Make a List

%p
  Freezing a list of records that should be targeted for an update allows you
  to run careful analysis on a group before and after a change is made

:codeblock
  :::sql
  CREATE MATERIALIZED VIEW update_candidates AS (
    SELECT id
    FROM employees
    WHERE hired IS NULL
  );

%p
  This allso allows one to run a series of updates without as much worry that
  one update will skew the results of subsequent queries.

%h2 Running Iterations

%p
  Rarely is it a good practive to run an update across five or ten million rows
  in a single transaction. This leaves far too much room for human error. One
  solution to this is to run an update on a small sample of records.
  %a{:href=>"https://www.postgresql.org/docs/current/static/app-psql.html"}psql(1)
  Allows you to set variables in an SQL script using
  %code \set var value
  or on the command line using the
  %code -v
  flag.

:codeblock
  psql hrdb -v mod_value=1 -f /tmp/update.sql

%p
  With this value in place we can proceed with a trial that bins records into
  approximately 100 groups

:codeblock
  :::sql
  BEGIN;
  UPDATE products
  SET in_stock='f'
  WHERE id % 100 = :mod_value
  AND in_stock IS NULL;
  ROLLBACK;

%p
  Once we're content with this we can run all 100 iterations

:codeblock
  :::sh
  for v in {0..99}
  do
    echo "pass $v of 99"
    psql hrdb -v mod_value=$v -f /tmp/update.sql
    sleep 10
  done

%p
  As this runs the administrator has the opportunity to see the new data going
  live progressively! If this change becomes suspect, the current update can be
  canceled with
  %em Ctrl-C
  and easily resumed by changing the starting number.

%h2 Range Updates

%p
  If you have a very large number of records, it may be nessesary to step
  through them in ranges.  This has the advantage of being a very fast query for
  a btree, and limits the chunk size of each update.

:codeblock
  :::sql
  WHERE id >= :min AND id < :max

%p
  The corresponding shell script might look like this

:codeblock
  :::sh
  let min=0
  while true
  do
    max=$((min+100))
    echo "$min-$max"
    psql hrdb -v min=$min -v max=$max -f /tmp/update.sql
    sleep 10
    min=$max
  done

%p
  If you are running a hot standby that is actively queried, keeping each update
  small is important to prevent lag.