Single-Minded Configuration

A Guide to Avoiding Abstractions in Systems Orchestration

<ericshane@eradman.com>

November 2020

Abstract

Configuration management is term that is usually used to describe a declarative approach to systems. Declarative configuration allows you to write down the intended state of a system and communicate these changes through a series of commits. The benefits of these tools are immense, but they are bundled at the cost of eye-watering complexity.

  1. Developing a "stabilized approach"
  2. How systems configuration is comparable to building software
  3. Why service orchestration appears to be difficult
  4. What sort of solutions are too simple
  5. The basis for a new generation of light-weight tools
  6. Advances in networking that facilitate push-based configuration

Preamble

I will never be a pilot, but I have a good deal of admiration for these men and women. One of the features of this profession that inspires me is the personal nature of working tight formation with the rest of the crew. Another is that habit of verbally cross-checking their actions with procedures and information they are receiving. They also have some outstanding methods of ordering priorities. The motto that most directly speaks to this is,

Aviate, Navigate, Communicate

Aviate, Navigate, Communicate

What does it mean to "Aviate"?

That means fly the airplane by using the flight controls and flight instruments to direct the airplane's attitude, airspeed, and altitude

Advanced Qualification Program

Simulator training

Never bust minimums

Train for all scenarios known to be problematic

Dan Gryder is a flight instructor who has taken up the cause of studying accident reports general aviation. This, and other hard personal experiences in his life put him on a mission to give pilots in single-engine aircraft everything they need to avoid a loss of control.

Part of his technique is to import the habits and tools that commercial airline pilots have. One of the interesting experiments he conducted was to ask airline pilots which skill was more important: a) stall recovery or b) energy management. Can you guess what they said?

You don't stall airliner!

Maybe don't even practice loss of control. Do you think the pilots who work for Southwest trade stories at the bar about adventures while stalling a 737? I hope not!

Here is what Dan says:

Learn to define and honor the 1.3 buffer at all times. Define it, placard it, honor it. It is what the airlines do every day. Do you think they memorize all those speeds? No, they are clearly defined and placarded for them at all times.

What is the "tool" or the "placard" he is referring to?

On a small aircraft, the tool [in this case] is a bright piece of tape on the airspeed indicator. When the pilot is trying to figure something out, one thing he does not have to remember is the minimum maneuvering speed for the machine he or she is operating.

This maneuvering speed is calculated ahead of time to allow up to a 30 degree bank angle.

The airlines and most 135 operators of large aircraft operate their own training and testing (all simulator based) under a program called AQP, or Advanced Qualifications Program. Under APQ, each airline gets to decide what to train [for]

The airline record is impressive...as they now train and check all possible scenarios (called maneuvers) known to be problematic over the course of time.

Dan also makes the point that none of this requires detailed federal regulations. All that is required is that you have a plan for handling the hazards your particular operating environment.

Procedures for Software Engineering

Build, Test, Integrate

Build/test source code < 10 seconds
Trial deployment < 3 minutes
Push to all users < 30 minutes

Software engineers don't have standard operating procedures, but every well-managed project has substantial list of rules and processes to follow. Don't believe me? Try submitting a patch to your favorite open-source project and find out how much correction you receive.

The strategy a software project employs is composed of everything we need to make a stabilized approach:

I'm assuming that you shipping tests with the code, but maybe this doesn't make sense for what you're doing. There are many times where you put together some quick specialized tests and then throw them away. This all depends on what you're doing.

The point is that your approach to development is able to transition from one stable condition to another stable condition. To go back to the example of aviation: a stabilized approach lightens the workload and gives you situational awareness. How so? You've done all the figuring in advance. Once everything is set you have nothing to do but watch how it's going.

Authoring Systems Configuration

Test configuration on an existing host < 10 seconds
Provision new infrastructure < 3 minutes
Commit, propagate < 30 minutes

This slide is systems configuration when framed from the perspective of software engineering.

Productivity (and mental health) in software development and systems administration correlates directly with the time it takes to validate a change. These numbers are merely examples, but you need to have something in mind, because delay in feedback changes how your development cycle.

If the deployment mechanism takes more than 15 seconds, it's not useful for providing interactive feedback. The you can't iterate on a problem, you will inevitably start to compensate by manually testing a fragment outside of your repository, and then by staring at a change long enough to convince yourself it's correct.

Noteworthy Projects

2012 entr(1) Run arbitrary commands when files change
http://eradman.com/entrproject/
2015 pg_tmp(1) Run tests on an isolated, temporary PostgreSQL database
http://eradman.com/ephemeralpg/
2018* rset(1) : pln(5) Configure systems using any scripting language
http://scriptedconfiguration.org/

Artists will often talk of "finding your voice". Why is this important to people?

I have administered BSD and Linux systems for a long time, and seem to gravitate toward strategies that feel like test driven development. for better or worse I have to express what I'm up to in these terms.

There's a star next to that last project because it's relatively new, and it's the topic of this presentation.

Unlike some other concepts which evolve and build over time, the concept for rset(1) came to me at sudden moment of inspiration. It was mostly the result of spending three years working with SaltStack.

Declarative Configuration

/etc/php-7.2/mysqli.ini:
    file.symlink:
        - target: /etc/php-7.2.sample/mysqli.ini

Templates, variables

Massive APIs

Modules and extensions

Product roadmap (features, bugs)

The longer I worked with Salt the more I valued it's capabilities, and the more I found the framework as an obstacle to what I was trying to accomplish.

  1. Templates are great for building documents, terrible for general-purpose programming
  2. Using primitive data structures to call functions is taxing way to write programs
  3. Adding your own behavior requires you to become a platform expert, just like a third-party integrator

Let me elaborate on that last point: typically you are using a DSL or writing YAML that maps to a framework's massive API.

Did I leave out any of the pain points that give you the most trouble with configuration management?

Far too often we commit the change in order to test the change, because development environments is difficult to configure and maintain.

Declarative Orchestration?

# step 1
/usr/local/bin/mysql_install_db:
    cmd.run:
        - creates: /var/mysql
# step 2

Dependencies, sequencing

Progressive status?

Same arid programming environment

What a configuration system grows a set of features large enough to overflow a river bank, it can be labeled an orchestration framework. What's not to like?

  1. Expressing an action is not too difficult, but chaining events based on the result of a previous step is a nightmare
  2. No mechanism for printing progressive status messages
  3. No automatic method of staging helper utilities and libraries

Orchestration is an advanced topic for configuration management, but all of you do this already. It's called scripting.

Oversimplification

ssh 10.5.5.1 < base-cfg.sh
ssh 10.5.5.1 < configure-wordpress.sh

What's missing?

Let's take a look at scripting. This is a solution that is too simple. First, we need files on the remote host

Second, we need a convention for associating configuration with hosts, and a means of running only part of the configuration.

Configuration Fundamentals

Adding/upgrade packages

Install files, directories, symlinks

Enable/start/restart services

} Map units of work into a profile for each host

There are few operations that configuration management systems must have.

If you have a way to install files, packages, and services you now have systems which are mostly reproducible. With only these three things, you can accomplish some valuable tasks:

  1. Rebuild or replace servers
  2. Make uniform changes across all hosts
  3. A versioned history of configuration changes that the rest of your team can follow

These are critical capabilities, and it's for good reason that configuration management has become mainstream.

rset(1): Remote Sequential Execution Tool

Stage of utilities on the remote machine

Secure access to remote files: rinstall(1)

Ability to run script fragments across one or many hosts: pln(5)

Run many script fragments over the same channel: ssh(1)

rset(1) is a tool for executing scripts with access to particular resources.

The ability to execute scripts is not enough. Nearly always you also need a collection of utilities or utility libraries. rset(1) creates a temporary directory populated with state.

A web server provides access to files that you will need to install. And some built-in utilities know how to install or modify files.

Progressive Label Notation is a tab-indented file format that allows you to organize configuration. It is always evaluated in order, and allows you to set some parameters that apply to subsequent operations.

OpenSSH supports something called a control master, which lets you ship files, run many scripts with no connection overhead.

pln(5): Progressive Label Notation

Blocks of configuration can be selected individually
Labels names beginning with [0-9a-z] are excluded by default:

root_tasks:
   crontab - <<-EOF
       ~ 1 * * * /usr/local/bin/renewcert
   EOF

Parameters will apply to subsequent labels

interpreter=/bin/sh -x

rset(1) uses it's own container format. This is different, and I think sets it apart from other attempts at a minimalist configuration management systems

Content of each label is indented with a tab. If don't have a good text editor, this might a problem for you. Why tabs?

[demo]

Configuration Mapping

routes.pln associates configurations with each hostname

vm2.eradman.com: vm2/
   vm2.pln
   wordpress.pln

172.16.0.5: alpine/
   alpine_vm.pln

Generate this file to handle dynamic inventory

The "top-level" configuration file is called routes.pln by default. These are always run in order paths after the : are files (config, scripts, libraries, anything) that you want staged on the remote host.

Dynamic inventory is a feature that you can handle yourself. rset reads a file. Use a template language or any other means you'd like to generate that file.

Standard Utilities

rinstall(1)

./rinstall xa10/pf.conf /etc/pf.conf \
    && pfctl -f /etc/pf.conf

rsub(1)

 ./rsub /etc/firefox/unveil.main <<-CONF
   /usr/local/heimdal/lib r
   /usr/lib r
 CONF

Some solutions are too simple. Landing on a remote host with a staging directory /bin/cp is not enough. These two shell utilities handle all Unix-like platforms

rinstall:

rsub:

Call Home: Tunnel Endpoint / Roaming Client

# jumphost/hostname.wg0
wgport 111 wgkey JUMP_HOST_PRIVATE_KEY
wgpeer ROAMING_HOST1_PUBLIC_KEY wgaip 10.0.0.20/32
wgpeer ROAMING_HOST2_PUBLIC_KEY wgaip 10.0.0.21/32
inet 10.0.0.1/24
# thinkpad10/hostname.wg0
wgkey ROAMING_HOST1_PRIVATE_KEY
wgpeer JUMP_HOST_PUBLIC_KEY wgendpoint proxy.xyz.com 111 wgaip 0.0.0.0/0
inet 10.0.0.20/24

There are some solutions that client-server systems seem to be well suited for. I don't like them for two reasons:

  1. You are putting the most venerable system [the configuration host] on the edge of a network.
  2. A meaningful test environment is nearly impossible

But I don't think you need a pull-based configuration scheme. One alternative is to use WireGuard use it to build links.

  1. Either side can initiate the connection
  2. IP forwarding and routes are not required optional
  3. In-kernel wg(4) interface guarantees the identity of IPs and interfaces

The only thing you need is a cron job that sends a ping once in a while to establish the tunnel

Accessing Hosts Through a Tunnel

ssh-agent(1) also cooperates with a jumphost

# .ssh/config
Host 10.0.0.20
    ProxyJump 192.168.0.2
Host 10.0.0.21
    ProxyJump 192.168.0.2

rset(1) doesn't any connection options because OpenSSH is already superb.

Conclusion

Factor out common or complex operations into light-weight utilities

Stage configuration data, scripts, and utilities on the remote host

Run scripts sequentially on the remote host from the staging directory

I have heard the observation that there is a paradox with respect to learning any topic. The first is that more you know, the more you can see there is to learn. The second is that once you've mastered a topic you finally see how simple it was all along.

Traditional configuration management is useful, but I wonder if our collective experience will lead to something else. That is, a new generation of tools which provide a simple convention for handling common configuration changes and without ensnaring complex tasks in an external framework.

With this mental model "orchestration" is a flamboyant term for scripting with supporting configuration and data files already staged.