Eric Radman : a Journal

Refactoring Legacy Systems

Write-Ahead Logging

Instead of keeping a recording what did happen, keep a list of improvements that will be made, and them execute them. The obvious benefit is that you and your team can maintain focus. It may be less obvious that the checklist communicates architectural values and thereby limits the type of changes that might be made.

If you're fortunate enough to be using Mercurial or Git each feature can be easily managed as an atomic unit of work by switching to a new branch for each intermediate step.

Write Regression Tests

Retrofitting an existing code base with unit tests is not always feasible, since the practice of unit testing is itself typically the motive for writing code that is more functional in nature. In this case some good functional tests are critical.

tests/recover-checkpoint/01_setup.py
tests/recover-checkpoint/02_run.py
tests/recover-checkpoint/03_continue.py
tests/recover-checkpoint/04_verify.py

Perform Rolling Upgrades

The goal of a refactoring sprint is to reach a consistent state again as quickly as possible. As soon as a coherent set of changes appears to be stable, initiate a series of upgrades that will incrementally prove the stability of the new code. Partitioning the deployment of configuration or packages also allows a team to make more aggressive changes and to start the next iteration faster. This scheme also presents an excellent context for implementing run-time health checks.

Fix Easy Tests First

After major surgery it's tempting to try to solve the failures of the most complex tests, but this is a little like trying to chop a large by aiming strait for the center. Instead look for cracks leading around the edge and start to solve the little problems first.

Treat Exceptions as Fatal

The most efficient way to create tangled code is to use exceptions as a general-purpose signaling mechanism.

def validate_step():
    # verify checksums
    if checkpoint != checkpoint :
        raise ValueError("Checkpoint invalid")

Exceptions have type but not identity, so this enables many insidious bugs to hide. We would be wise to remove this construct whenever possible. Sane flow-control can be introduced by structuring code in terms of functions that return values flow control. This is much easer to test and debug.

def validate_step():
    # verify checksums
    if checkpoint != checkpoint:
        return (False, "Checkpoint invalid")
    return (True, "Checkpoint OK")

Make Assertions

If values are expected to be in a certain range, use assertions in the code base so that unforeseen conditions at caught early.

assert sim_perf != sim_perf, "expect to be a float, but NaN is not valid"

Log Function Calls

It is sometimes useful to print some additional detail in log messages, such as the name of the module and the function that invoked a log message.

import sys
class Log(object):
    def write(self, msg):
        modname = self._modname(sys._getframe(2).f_code.co_filename)
        frame2 = sys._getframe(2).f_code.co_name
        print(%s.%s() %s" % (modname, frame3, msg)

Acknowledge Architectural Limits

At the core of every project is a paradigm that can never change. With effort a large framework may become more modular, but it will never become a minimalist library.

References

DesignStaminaHypothesis by Martin Fowler

Last updated on March 08, 2017
eradman.com/source