Elena Washington

Staff Software Engineer — Infrastructure

Site Reliability & Cloud Infrastructure engineer with 17+ years of engineering experience, designing, scaling, and hardening mission-critical systems globally.

About Me

I'm a Staff Site Reliability Engineer with 17+ years of engineering experience (13+ in cloud/SRE), designing, scaling, and hardening mission-critical systems globally. I began my career programming control systems for power generation turbines worldwide before transitioning to cloud infrastructure. Currently at Gusto, I serve as the Staff-level domain owner for Disaster Recovery, setting reliability objectives and aligning product, security, and platform teams on DR strategy across the organization's microservices.

I'm passionate about building reliable, secure infrastructure foundations that enable teams to move fast with confidence. My work spans multi-region disaster-recovery architectures and Infrastructure-as-Code stacks—all serving millions of users. Recognized across teams as a go-to person for infrastructure questions and a trusted mentor, I also founded two internal learning communities (Python Guild and Cloud Security Guild) to fill critical knowledge gaps and foster a culture of continuous learning.

Technical Skills (click to filter projects)

AWS
Terraform & IaC
Python, Ruby, Go
Distributed Systems
Site Reliability Engineering
Disaster Recovery
CI/CD Pipelines
Multi-Region Architecture
API Integration

Leadership & Collaboration (click to filter projects)

Community Building
Technical Mentorship
Cross-Functional Collaboration
Technical Communication
Strategic Planning
Stakeholder Management
Technical Documentation
Process Improvement

Featured Projects

Filtering by:

Python Guild

Founded and lead Python Guild, establishing testing standards and best practices across the organization. Hosted ML/AI talks and delivered PM showcases. Created comprehensive testing playbook that reduced Spacelift CLI test runtime by 94% (39.25s → 2.35s) while maintaining 82% coverage.

Python Testing Community Documentation

Cloud Security Guild

Co-founded Cloud Security Guild with 50+ monthly attendees, filling critical knowledge gaps in AWS security practices. Delivered comprehensive AWS security presentation and coordinated speakers including security experts from external companies. Recognized as filling a "sorely neglected area" at the organization.

AWS Security Leadership Education Community

Disaster Recovery Testing Automation

Built automated failover testing platform with dual-path authentication (Keycloak + direct-to-Okta fallback) and comprehensive workflow orchestration. Enabled automated disaster recovery validation for critical production applications, ensuring business continuity during regional outages.

Disaster Recovery Automation Python API Development

Spacelift CLI & Migration Observability

Leveraged Spacelift's GraphQL API to build custom internal CLI from scratch when spacectl fell short, implementing run management, stack operations, shell completion, and interactive confirmations. Integrated Datadog metrics to track adoption of new IaC tooling across 1100+ Terraform stacks, enabling data-driven decisions and velocity forecasting against leadership-imposed deadlines.

CLI Development Python Terraform API Integration Datadog Observability

Advent of Code Leaderboard

Built internal Advent of Code leaderboard web application with REST API integration and real-time rankings. Fostered engineering community engagement and friendly competition across the organization.

Python API Integration Community Web App

Multi-Region Disaster Recovery

Directed company-wide disaster-recovery program, automating multi-region failover tests and cutting RTO from hours to under 20 minutes. Drove cross-region KMS key migration, Aurora global clusters and Redis global replication groups, and bi-directional ECR & image replication. Recovered 5 years of container images after accidental deletion, preventing major service disruption.

AWS Disaster Recovery Terraform Multi-Region

Experience

Staff Site Reliability Engineer

Gusto - New York, NY

September 2019 – Present

  • Staff-level domain owner for Disaster Recovery, setting reliability objectives and aligning product, security, and platform teams on DR strategy across the organization
  • Directed company-wide disaster-recovery program, automating multi-region failover tests and cutting RTO from hours to under 20 minutes
  • Drove cross-region KMS key migration, Aurora global clusters and Redis global replication groups, and bi-directional ECR & image replication; recovered 5 years of container images after accidental deletion
  • Authored 5 production Terraform modules for global-replication datastores (Elasticache and Aurora Global clusters), Route 53, and secret replication
  • Founded and lead Python Guild and Cloud Security Guild (internal learning communities with 50+ attendees each), establishing testing standards and filling critical security knowledge gaps organization-wide
  • Mentored four engineers across Infrastructure and Security teams
  • Achieved zero SOC 2 exceptions through transparent engagement with auditors, supporting compliance and contractual requirements

Software Engineer, Infrastructure Automation

Compass - New York, NY

February 2019 – August 2019

  • Built Ansible laptop-bootstrap, reducing engineer onboarding time from half-day to under 45 minutes
  • Co-developed internal PaaS abstraction over AWS, enabling 100+ engineers to self-service deploy microservices with governed IAM roles and CI gates
  • Advocated for inclusive hiring and development practices to senior management

Site Reliability Engineer

Beeswax - New York, NY

May 2018 – February 2019

  • Maintained revenue-critical ad-tech platform processing 2M QPS under 100ms latency SLO
  • Extended Python CLI to automate Spot/EC2 fleet operations, EMR jobs, and blue-green deploys
  • Instrumental in humanizing the on-call process and improving operational reliability

DevOps Engineer

NS1 - New York, NY

September 2017 – April 2018

  • Formed and led a cross-functional squad that equipped operations engineers with dashboards, runbooks, and alerting to operate a globally distributed managed DNS platform
  • Improved mean time to detection by 30% through better tooling and processes
  • Instituted lightweight post-mortem and hand-off processes that enabled 25% traffic growth while keeping operations headcount flat
  • Managed deep-dive project on RabbitMQ cluster health; deployed configuration changes that halved operator pages and doubled message throughput
  • Mentored junior engineers and participated in follow-the-sun on-call rotation

Site Reliability Engineer

Greenhouse Software - New York, NY

June 2015 – September 2017

  • Built and maintained an in-house PaaS utilizing Docker containers and Marathon/Mesos (later Kubernetes) for development and production workloads using AWS
  • Configured and administered SMTP mail servers and Redis clusters
  • Started an internal Infrastructure Support process to simplify the support workflow, reduce wait times and identify common issues
  • Organized and ran a learning and development series for junior employees interested in engineering
  • Spearheaded an externally recognized Diversity and Inclusion initiative and a public Meetup promoting diversity in tech

DevOps Engineer

Stocktwits - New York, NY

January 2015 – June 2015

  • Responsible for maintaining a fleet of dedicated servers
  • Assembled a detailed plan and set of recommendations on moving to cloud providers

Systems Engineer

Opower - Arlington, VA

April 2014 – January 2015

  • Responsible for day-to-day operations of over 800 virtual and physical servers
  • Provided training and documentation about systems, tools, and best practices to colleagues of various technical proficiencies
  • Participated in an on-call rotation to respond to system emergencies
  • Built and maintained automated and self-service engineering tools

Associate Professional Services Engineer

Opower - Arlington, VA

September 2012 – March 2014

  • Configured and customized Opower's energy efficiency software to meet specific needs of utility clients
  • Maintained and provided escalated support for utility energy efficiency programs after launch
  • Analyzed, transformed, and loaded utility data from client systems

Controls Commissioning Engineer

Alstom Power

June 2008 – August 2012

  • Programmed and tested control systems that automated turbine power generation equipment (gas, steam, coal) for routine and emergency maintenance globally
  • Built automated tools to streamline field commissioning workflows, improving team productivity
  • Provided technical support to sales and commercial operations from prospection through commissioning
  • Customer-facing technical expert role supporting critical infrastructure worldwide

Education

🎓

Bachelor of Science in Mechanical Engineering

Virginia Commonwealth University

Richmond, VA

May 2008

Graduated Cum Laude

What Colleagues Say

"Elena is recognized as an evangelist, thought leader, and partner for the work she does regarding Disaster Recovery."

Engineering Manager Performance Review Feedback

"Elena routinely goes above and beyond to help with critical projects. Her knowledge, patience, and shared interest in good work made a complex Cloudflare WAF governance project possible, increasing developer velocity and freeing up the security team."

Chief Information Security Officer Cross-Team Leadership

"Elena answered SOC 2 auditor questions concisely and made space for follow-ups. Gusto did not receive any SOC 2 exceptions as a result of how she engaged the audit team."

Engineering Manager SOC 2 Audit Leadership

"Go-to person for infrastructure questions. Elena's expertise and knowledge made critical debugging significantly faster."

Privacy Engineering Team Cross-Team Collaboration

"Elena was one of two people recommended for help and mentoring. She's absolutely recognized as a leader in prodsec."

Security Team Peer Recognition

"Elena is modeling all the right traits—patience, enthusiasm, and humility. The Cloud Security Guild is filling a sorely neglected area."

Staff Engineer Guild Leadership

"Elena is hard working, fastidious, attentive, and forward thinking. She occasionally needs to be reminded to put the dishes away."

Life Partner Remote Work Colleague

Get In Touch

I'm always interested in hearing about new projects and opportunities. Please get in touch.

Currently Open To

  • Consulting Opportunities
  • Speaking Engagements
  • Technical Mentorship
  • Full-time Positions

Let's Connect

13+ Years Experience
Staff Engineer Level
NYC Based