%h1 Iterative Large Updates %p Among the many strong features of a relational database is the capability to adjust existing of records :codeblock :::sql UPDATE employees SET hired='2017-07-07' WHERE name='Kevin'; %p Even at a small scale there are some dangers. The %code WHERE clause is %a{:href=>"https://github.com/eradman/pg-safeupdate/"} not mandatory so it helps to try it in a transaction first. And you can see the results as well :codeblock :::sql BEGIN; UPDATE employees SET hired='2017-07-07' WHERE name='Kevin' RETURNING employees.*; ROLLBACK; %p Once content with the results change the last line to = succeed '.' do %code COMMIT %h2 Suspend Side-Effects %p If you are attempting to modify thousands or millions of rows, update triggers need to be accounted for. At the very least these triggers will cause bulk updates to run slowly. Fortunately it is possible to disarm these by setting :codeblock :::sql SET session_replication_role = replica; %p A word of caution: this also disables system triggers such as foreign key constraints. You also need to be running as a user that includes the role %em Superuser. %h2 Make a List %p Freezing a list of records that should be targeted for an update allows you to run careful analysis on a group before and after a change is made :codeblock :::sql CREATE MATERIALIZED VIEW update_candidates AS ( SELECT id FROM employees WHERE hired IS NULL ); %p This allso allows one to run a series of updates without as much worry that one update will skew the results of subsequent queries. %h2 Running Iterations %p Rarely is it a good practive to run an update across five or ten million rows in a single transaction. This leaves far too much room for human error. One solution to this is to run an update on a small sample of records. %a{:href=>"https://www.postgresql.org/docs/current/static/app-psql.html"}psql(1) Allows you to set variables in an SQL script using %code \set var value or on the command line using the %code -v flag. :codeblock psql hrdb -v mod_value=1 -f /tmp/update.sql %p With this value in place we can proceed with a trial that bins records into approximately 100 groups :codeblock :::sql BEGIN; UPDATE products SET in_stock='f' WHERE id % 100 = :mod_value AND in_stock IS NULL; ROLLBACK; %p Once we're content with this we can run all 100 iterations :codeblock :::sh for v in {0..99} do echo "pass $v of 99" psql hrdb -v mod_value=$v -f /tmp/update.sql sleep 10 done %p As this runs the administrator has the opportunity to see the new data going live progressively! If this change becomes suspect, the current update can be canceled with %em Ctrl-C and easily resumed by changing the starting number. %h2 Range Updates %p If you have a very large number of records, it may be nessesary to step through them in ranges. This has the advantage of being a very fast query for a btree, and limits the chunk size of each update. :codeblock :::sql WHERE id >= :min AND id < :max %p The corresponding shell script might look like this :codeblock :::sh let min=0 while true do max=$((min+100)) echo "$min-$max" psql hrdb -v min=$min -v max=$max -f /tmp/update.sql sleep 10 min=$max done %p If you are running a hot standby that is actively queried, keeping each update small is important to prevent lag.