Staff Site Reliability Engineer
San Francisco, CA (Remote)
Our mission is to make higher education accessible and affordable for everyone. We empower students with financial support and supercharge their ability to pay down their debt, so they can get on the right financial track, fast.
We build tools that help people feel in control of their financial future, including:
- Private student loans - low rates, people-first service, and flexible payments.
- Student loan refinancing - break free from high-interest rates or monthly payments.
- Scholarships - access to thousands of scholarships to help students pay less.
Earnies are committed to helping students live their best lives, free from the stress of student debt. If you’re as passionate as we are about our mission, read more below, and let’s build something great together!
The Staff Site Reliability Engineer position will report to the Lead Cloud Engineer.
As a Staff Site Reliability Engineer, you will:
- Ensure the reliability, scalability, performance, and security of systems while managing and optimizing infrastructure for efficiency and minimal downtime.
- Develop, maintain, and enhance observability and CI/CD tools (Splunk, New Relic, GitHub Actions, Terraform, etc.), streamline deployments, update documentation, and improve internal tools for efficiency and scalability.
- Lead product initiatives, conduct resiliency reviews, coordinate cross-team efforts, and manage goals, risks, and resources for successful delivery.
- Advise on reliability, mentor engineers on best practices, facilitate cross-team communication, and translate stakeholder needs into technical solutions..
- Lead key projects, stay current with industry trends, formalize best practices, and mentor engineers in building and troubleshooting reliable distributed systems.
About You:
- 8+ years of experience in reliability, scalability, performance, security, and enterprise system architecture, with a focus on toil reduction and best practices implementation.
- Strong coding skills in at least one language (Go, Python, Java Spring Boot, .NET, etc.) and deep knowledge of software applications, technical processes, and emerging disciplines.
- Hands-on experience with monitoring and telemetry tools (Grafana, Prometheus, Datadog, Splunk, etc.), SLO alerting, and CI/CD tools (Jenkins, GitHub Actions, GitLab, Terraform).
- Expertise in containerization and orchestration (Kubernetes, Docker, ECS) and troubleshooting networking and distributed system issues.
- Experience creating infrastructure resources using Terraform or OpenTofu, with formal training or certification in software engineering and 5+ years of applied experience.
- Willingness to travel to the Oakland office monthly to collaborate with other Earnies.
Where:
- This role will …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Must travel to Oakland office monthly
Benefits/PerksAnnual PTO Competitive benefits Employee stock purchase plan Health, dental & vision benefits Mac computers Monthly internet and phone reimbursement Parental leave PTO Remote work Savings plans Travel perk Tuition reimbursement Vision Benefits Work from home stipend
Tasks- Collaboration
- Develop observability tools
- Ensure system reliability
- Lead product initiatives
- Manage infrastructure
- Mentor engineers
CI/CD Collaboration Communication Datadog Distributed Systems Docker ECS GitHub GitHub Actions Go Grafana Industry trends Infrastructure Infrastructure Management Java Jenkins Kubernetes Monitoring .Net Networking New Relic Observability Performance Prometheus Python Scalability Site Reliability Engineering Software Engineering Splunk Spring Boot Terraform Troubleshooting
Experience8 years
Education TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9