FreshRemote.Work

Staff Site Reliability Engineer

San Francisco, CA (Remote)

Our mission is to make higher education accessible and affordable for everyone. We empower students with financial support and supercharge their ability to pay down their debt, so they can get on the right financial track, fast.

We build tools that help people feel in control of their financial future, including:

  • Private student loans - low rates, people-first service, and flexible payments.
  • Student loan refinancing - break free from high-interest rates or monthly payments.
  • Scholarships - access to thousands of scholarships to help students pay less.

Earnies are committed to helping students live their best lives, free from the stress of student debt. If you’re as passionate as we are about our mission, read more below, and let’s build something great together! 

The Staff Site Reliability Engineer position will report to the Lead Cloud Engineer.

As a Staff Site Reliability Engineer, you will:

  • Ensure the reliability, scalability, performance, and security of systems while managing and optimizing infrastructure for efficiency and minimal downtime.
  • Develop, maintain, and enhance observability and CI/CD tools (Splunk, New Relic, GitHub Actions, Terraform, etc.), streamline deployments, update documentation, and improve internal tools for efficiency and scalability.
  • Lead product initiatives, conduct resiliency reviews, coordinate cross-team efforts, and manage goals, risks, and resources for successful delivery.
  • Advise on reliability, mentor engineers on best practices, facilitate cross-team communication, and translate stakeholder needs into technical solutions..
  • Lead key projects, stay current with industry trends, formalize best practices, and mentor engineers in building and troubleshooting reliable distributed systems.

About You: 

  • 8+ years of experience in reliability, scalability, performance, security, and enterprise system architecture, with a focus on toil reduction and best practices implementation.  
  • Strong coding skills in at least one language (Go, Python, Java Spring Boot, .NET, etc.) and deep knowledge of software applications, technical processes, and emerging disciplines.  
  • Hands-on experience with monitoring and telemetry tools (Grafana, Prometheus, Datadog, Splunk, etc.), SLO alerting, and CI/CD tools (Jenkins, GitHub Actions, GitLab, Terraform).  
  • Expertise in containerization and orchestration (Kubernetes, Docker, ECS) and troubleshooting networking and distributed system issues.  
  • Experience creating infrastructure resources using Terraform or OpenTofu, with formal training or certification in software engineering and 5+ years of applied experience. 
  • Willingness to travel to the Oakland office monthly to collaborate with other Earnies.

Where:

  • This role will …
This job isn't fresh anymore!
Search Fresh Jobs