Site Reliability Engineer

Remote

Thalamus

USD 200K+ Full Time Senior

Company preview All jobs at Thalamus

Apply Job profile

Published 1 week ago

About Thalamus

Our mission is to help the right doctors practice at the right hospitals to treat the right patients. We leverage a passion for technology, medical education, equity, and data-driven research to optimize physician recruitment, starting with the medical residency recruitment process.

Our philosophy is that the opportunity to practice medicine in an ideal environment should be accessible to all, and ample medical research has shown that this results in patients getting better healthcare outcomes overall. We built a comprehensive interview management platform, backed by evidence-based research, to innovate, streamline, and optimize the residency recruitment process.

At Thalamus, our SRE will lead our cloud infrastructure transformation initiatives. In this role, you will be responsible for architecting, implementing, and optimizing our reliability strategy across all platforms, with a focus on driving our cloud infrastructure modernization efforts. The successful candidate will lead cross-functional teams to design and implement observability solutions, establish automated infrastructure provisioning, and create consistent environments leveraging Kubernetes.

You will...

Technical Leadership: Provide expert technical guidance for cloud infrastructure, observability, and reliability engineering practices
Architecture Design: Design and implement a scalable, resilient cloud architecture leveraging Kubernetes ecosystem technologies
Observability Strategy: Lead the implementation of comprehensive monitoring and telemetry solutions to provide visibility across the entire technology stack
Automation Excellence: Champion infrastructure-as-code methodologies and implement repeatable, automated deployment patterns
Disaster Recovery: Develop and improve business continuity/disaster recovery strategies and solutions
Team Leadership: Mentor Staff SREs and other engineers on cloud-native technologies and reliability best practices
Cross-team Collaboration: Partner with development teams to establish and maintain effective Service Level Objectives

You should have...

8+ years of experience in infrastructure engineering, with at least 3 years in a senior leadership position
10+ years of AWS experience and 5+ years of Azure experience
Deep expertise with Kubernetes orchestration and ecosystem technologies
Extensive experience implementing observability solutions (metrics, logging, tracing, alerting)
Strong background in infrastructure automation using Terraform, Helm, or equivalent tools
Experience architecting high-availability systems in cloud environments
Track record of leading significant infrastructure initiatives and driving architectural decisions
Exceptional communication skills with the ability to explain complex technical concepts to diverse audiences

Bonus

Experience with multi-cloud and hybrid cloud architectures
Knowledge of DataDog or Prometheus observability stacks
Experience migrating workloads from traditional platforms to Kubernetes
Background implementing GitOps workflows for infrastructure and application deployment
Knowledge of service mesh technologies (Istio, Linkerd, etc.)
Experience implementing zero-trust security models in cloud environments

The salary range for this position is $200,000 - $250,000 and a grant of stock options. Final compensation will be determined based on experience, skills, and geographic location.

Our Commitment ...

Thalamus is a mission-driven organization centered on the belief that our company should model what we want of the US healthcare system, that the diversity of providers aligns with patient populations. We believe this is best achieved by building a team with a diversity of backgrounds, cultures, and experiences, including “distance traveled.” Thalamus is an equal opportunity employer. We do not discriminate based upon race, religious creed, color, national origin, ancestry, physical or mental disability, medical condition, genetic information, marital status (including registered domestic partnership status), sex and gender (including pregnancy, childbirth, lactation, and related medical conditions), gender identity and gender expression (including transgender individuals who are transitioning, have transitioned, or are perceived to be transitioning to the gender with which they identify), age, sexual orientation, Civil Air Patrol status, military and veteran status, and any other consideration protected by federal, state, or local law. We encourage those who really want to make an impact and who exemplify our core values to apply for our open positions.

Actual base salary offered will be determined by: experience, skills, and work location. This range is for base salary, our total compensation includes equity and benefits. We welcome you to apply even if your expectations are outside our listed range.

Thalamus is committed to providing reasonable accommodations for qualified individuals with disabilities in our job application procedures and throughout employment. If you need assistance or any accommodation, please let us know.

Thalamus does not accept unsolicited resumes from recruiters or employment agencies without a fully executed recruitment agreement in place. In the absence of such agreement, Thalamus reserves the right to pursue and hire any candidates without an obligation to pay fees. Agencies are requested not to contact Thalamus hiring managers or employees regarding recruiting services.

*This position is based in the United States, and you must be legally authorized to work in the United States.

Apply

Job Profile

Tasks

Architect and implement reliability strategy
Champion infrastructure-as-code
Collaborate with development teams
Design scalable cloud architecture
Develop disaster recovery strategies
Implement observability solutions
Lead cloud infrastructure transformation
Mentor engineers

Skills

AWS Azure Cloud Infrastructure GitOps Helm High-Availability Systems Infrastructure Automation Kubernetes Monitoring Observability Reliability Engineering Service Mesh Telemetry Terraform Zero Trust Security

Experience

8 years

Remote Jobs in North America Remote Jobs in Europe Remote Jobs in Asia/Pacific Remote Jobs in South America Remote Jobs in Africa Remote Jobs in Middle East Full Time Remote Jobs Part Time Remote Jobs Internship Remote Jobs Contract Remote Jobs Temporary Remote Jobs Freelance Remote Jobs Mid-Level Remote Jobs Senior-Level Remote Jobs Entry-Level Remote Jobs Exec-Level Remote Jobs Lead-Level Remote Jobs