Collaborated on Anton 3 task scheduling using Slurm. Ensured that logical Anton submachines could be effectively reconfigured by allocating "frontend" resources on Kubernetes.
Creator of a REST service and Web UI that provides authenticated access to simulation data. Authentication tokens obtained using Kerberos or MUNGE. As with other projects, built tractable server-side tests by spinning up a temporary database to run unit tests.
Built a service for monitoring database availability characteristics such as connect times, recovery status and disruption of idle connections. Eliminated manual steps to monitoring by adding URIs to our database host map.
Reorganized PostgreSQL administration around a set of scripts after finding that the learning curve with Salt was prohibitive for most other team members. Database relationships and major versions were defined in a single YAML file. Core logic was implemented using a utility written in Python, and special policies were implemented in shell.
Established database roles for all Anton services and created a schema layout using minimum set of privileges for each role required. Institutionalized a mechanism for retrieving database credentials, and adapted test harnesses to use the common schema definition.
Implemented a test harness for cluster provisioning system by deprecating legacy tooling that relied on MySQL. A SQLAlchemy-derived schema which was used to run tests on ephemeral PostgreSQL instance. Accomplished progressive migration from MySQL by periodically refreshing a PostgreSQL instance for read-only services to use.
Fully automated deployment of PostgreSQL databases using Salt. Database profiles defined such features as installation of custom extensions, proxy configuration, and point-in-time recovery. Administrative tasks such as major version upgrades, hot failover and status queries were also standardized using Salt execution modules.
Release engineer for Desjob, the command-line tools used to create and start simulations on Anton and Anton 2. Automatic failure recovery and data-driven error classification enabled simulations to survive faulty hardware or disruptions in access to network resources. Provided stable releases by maintaining multiple layers of verification, including frequent design discussions, code review, automated system tests and nightly replay of simulations.
Primary author and maintainer of AMSv2, an application server used to provide automated and administrative control over the logical and physical components of Anton 2. Improved performance characteristics over time by restructuring service methods around tests that verified that the object/relational mapper was generating correct queries. Handled all aspects of database server maintenance, from schema migrations to hardware upgrades.
Conducted more than 130 phone screens and in-house interviews for positions in Operations, System Software, System Administration, and Scientific Software. In all cases great care was taken to produce an essay that described a candidates potential and fit for a given position.
Key developer for AMS, an application server used to provide record keeping, administrative control, and state used by running simulations and the logical components of Anton. Time to deliver new features radically reduced through the introduction of a full set of unit tests. Provided the Operations team with a responsive user interface for visualizing submachine and queue utilization.
Assisted with maintenance of 1500-node Linux cluster by routinely diagnosing and replacing faulty hardware. Facilitated the installation and functional testing of Eighteen 512-node Anton supercomputers.
Handled all technical aspects of running a regional ISP, teisprint.com. Provided support and generated documentation for 8 dedicated T1/ISDN accounts, 1300 dial-up users, 2100 e-mail addresses. Executed a migration of all services and equipment to a new collocation facility during an acquisition of ezaccess.net.
Enabled the business office to provide support to e-mail and dialup customers by implementing a complete account management interface along with tutoring on basic troubleshooting techniques using command line utilities.
Designed an in-house ticket management system aimed at improving visibility on outstanding issues. Substantially improved the effectiveness of a small team by making progress visible.
Provided on-site consulting in voice and data networking to an average of sixty companies each year in Northeastern Pennsylvania and the Southern Tier of New York.
|1985–1997||Grades 1–12, Home Schooled|
BSD Associate recertification
BSD Associate from the BSD Certification Group
Troubleshooting, Maintaining, and Repairing Personal Computers
Open book exam
A Stabilized Approach to Systems Orchestration
Recording, December 19, 2020
Overcoming First Principles,
A guide for accessing the features of PostgreSQL in test-driven
PGConfUS, April 19, 2016
Learning Through Composition,
A study in building modern Unix tooling
NYCBUG, January 13, 2015
Project page for rset(1)/pln(5), an approach to configuring remote systems using common scripting languages and tools.
A site to showcase some of my work in analog photography, while providing others with an explanation of techniques that I have acquired in using cameras that do not have advanced automation.
Project page for pg_tmp(1), of a tool for quickly spinning up temporary PostgreSQL databases. Crafted to give unit tests full access to the capabilities of the database. Later ddl_compare(1) was developed to enable comparisons of complex schemas.
Project page for entr(1), a cross-platform tool for running arbitrary commands when files change. Crafted to promote rapid feedback and automated testing.
Tutorials and commentary on programming, networking, and administration of Unix-like systems.
Last updated on April 23, 2021
Print this Document