Eric Shane Radman

8 Tudor Dr.
Endicott, NY 13760

ericshane@eradman.com
646-761-8452

Professional Experience

8/2022 – Present Starfish Storage

Systems Engineer, Remote

Standardized Redash deployments by automatically tagging each months release candidate, and testing release images against the Starfish stack. Lightened internal maintenance by reviewing outside contributions and integrating patches with the upstream project.

Implemented a method of expiring database records to allow each volume under management to efficiently remove history that is no longer needed. Documented the tradeoffs and best practices for repacking/reindexing databases as large as 30TB in size.

Extended the capabilities of Redash to allow for informative email alerts. Developed periodic queries to provide trend information about storage under management.

Designed a tool for sending storage reports over e-mail by rendering Redash dashboards as PDFs along with raw data from individual queries. Developed a configuration format which enabled dashboard parameters to be customized for each recipient. StarfishStorage/redash-email was released as an open source project in 2024.

6/2010 – 7/2022 D. E. Shaw Research, LLC

Systems Administrator, Endicott NY

Implemented Autofailover for PostgreSQL using Consul and a custom health check daemon which enabled nearest_n service discovery. Extended existing configuration management to handle provisioning. Used Consul Template to redirect exiting connections on PgBouncer after a leader is promoted.

Collaborated on Anton 3 task scheduling using Slurm. Ensured that logical Anton submachines could be effectively reconfigured by allocating frontend resources on Kubernetes.

Creator of a REST service and Web UI that provides access to simulation data based on group membership. Authentication tokens obtained using Kerberos or MUNGE. Used PostgreSQL logical replication to subscribe to upstream data.

Designed a service for monitoring database availability characteristics such as connect times, recovery status and disruption of idle connections.

Reorganized PostgreSQL administration using a set of scripts after finding that the learning curve of SaltStack was prohibitive for other team members. Database relationships and major versions were defined in a single YAML file.

Established database roles for all Anton services and created a schema layout using minimum set of privileges for each role. Institutionalized a mechanism for retrieving database credentials, and adapted test harnesses to use the common schema.

Implemented a test harness for cluster provisioning using an SQLAlchemy-derived schema. Accomplished progressive migration from MySQL by periodically refreshing a PostgreSQL instance for read-only services.

Release engineer for Desjob, used to create and start simulations on Anton and Anton 2. Automatic failure recovery and data-driven error classification enabled simulations to survive faulty hardware.

Primary author and maintainer of AMSv2, an application server used to provide administrative control over the logical and physical components of Anton 2. Improved performance characteristics over time by restructuring service methods around tests to verify that the object/relational mapper was generating correct queries.

Conducted approximately 140 phone screens and in-house interviews for positions in Operations, System Software, System Administration, and Scientific Software. In all cases care was taken to produce an essay describing a candidate's potential and fit for a given position.

3/2009 – 6/2010 D. E. Shaw Research, LLC

Systems Contractor, Endicott NY

Key developer for AMS, an application server used to provide record keeping, administrative control, and simulation state for Anton. Provided the Operations team with a user interface for visualizing submachine and queue utilization.

Assisted with maintenance of 1500-node Linux cluster by routinely diagnosing and replacing faulty hardware. Facilitated the installation and functional testing of Eighteen 512-node Anton supercomputers.

10/2001 – 7/2009 Telecommunication Enterprises Inc

Network Administrator, Clarks Summit PA

Handled all technical aspects of running a regional ISP, teisprint.com. Provided support and generated documentation for 8 dedicated T1/ISDN accounts, 1300 dial-up users, 2100 e-mail addresses. Executed a migration of all services and equipment to a new collocation facility during an acquisition of ezaccess.net.

Enabled the business office to provide support to e-mail and dialup customers by implementing a complete account management interface along with tutoring on basic troubleshooting techniques using command line utilities.

Designed an in-house ticket management system aimed at improving visibility on outstanding issues. Substantially improved the effectiveness of a small team by making progress visible.

Provided on-site consulting in voice and data networking to an average of sixty companies in Northeastern Pennsylvania.

Certifications

2013 BSD Associate recertification
Philadelphia PA
2009 BSD Associate from the BSD Certification Group
Washington D.C.
1997 Troubleshooting, Maintaining, and Repairing Personal Computers
Open book exam

Talks and Presentations

Minimalist Scripted Configuration, Agent-Server, Gather-Fact, and Remote Execution
NYCBUG, June 2, 2021

Single-Minded Configuration, A Stabilized Approach to Systems Orchestration
Recording, December 19, 2020

Overcoming First Principles, A guide for accessing the features of PostgreSQL in test-driven development
PGConfUS, April 19, 2016

Learning Through Composition, A study in building modern Unix tooling
NYCBUG, January 13, 2015

Knowledge Sharing and Open-Source

2021 – Present sidecomment.io

A web service which allows users to become a contributor to web pages by selecting text and submitting comments. User comments create a ticket which provides site operators the opportunity to refine and incorporate feedback.

2018 – Present scriptedconfiguration.org

Project page for rset(1)/pln(5), an approach to configuring remote systems using common scripting languages and tools.

2018 – Present raticalphotography.com

A site to showcase some of my work in analog photography, while providing others with an explanation of techniques that I have acquired in using cameras that do not have advanced automation.

2014 – Present eradman.com/ephemeralpg

Project page for pg_tmp(1), of a tool for quickly spinning up temporary PostgreSQL databases. Crafted to give unit tests full access to the capabilities of the database.

2012 – Present eradman.com/entrproject

Project page for entr(1), a cross-platform tool for running arbitrary commands when files change. Crafted to promote rapid feedback and automated testing.

1999 – Present eradman.com

Tutorials and commentary on programming, networking, databases, and administration of Unix-like systems.

Last updated on March 23, 2024