FreshRemote.Work

Senior Site Reliability Engineer

Remote

🌎 About UsAt TeamSnap, we believe when the world connects through sports; the world becomes better. TeamSnap is a sports and communication platform dedicated to taking the work out of play in youth sports. We also believe our jobs should excite us, our teammates should support us and our bosses should inspire us. We empower our people to bring big ideas and tiny egos, landing us on Outside Magazine’s list of “Best Places to Work" and Built In’s “100 Best Remote-First Places to Work."
TeamSnap is seeking a Senior Site Reliability Engineer to join our remote infrastructure team. This person will play a pivotal role in ensuring a seamless experience for both our developers and users. By driving improvements in the development lifecycle, automating tasks, and taking our development tools to the next level with AI, you'll be the backbone of our product initiatives.
As a key member of our engineering team, you will architect and build scalable, highly available systems alongside our infrastructure team that serve millions of daily users and some of the largest youth and amateur sports organizations in the world. We value collaboration and regularly participate in pair sessions and virtual team swarms to stay connected and improve the team and company.

What You'll Do:

  • You'll build scalable, reliable systems using cutting-edge technologies like Kubernetes, Docker, Terraform and public cloud platforms, ensuring our applications reach a global audience.
  • Collaborating across teams, you'll identify pain points in the development lifecycle and build tools to improve efficiency and reliability. 
  • You'll also be on the front lines during incidents, working closely with engineers across the company to quickly resolve issues and strengthen our infrastructure.
  • You'll be a champion for system reliability, continuously optimizing performance, monitoring systems, and leading incident response efforts. By proactively addressing issues and exploring innovative solutions.
  • You'll ensure the smooth operation and resilience of our platform, providing an exceptional user experience.

What Will Set You Up for Success:

  • 5+ years of SRE or equivalent experience: Demonstrated success building and maintaining large-scale production systems.
  • Experience with Kubernetes, Docker, cloud platforms (ideally GCP), and IaC tools like Terraform, and a proven ability to monitor, scale, debug and harden web services and APIs.
  • Strong analytical and communication skills, with experience working with product engineers and participating in on-call rotations.
  • Proficiency in at least one of our core languages (GO, Elixir, Typescript) to automate and …
This job isn't fresh anymore!
Search Fresh Jobs

Job Profile

Restrictions

Alaska Delaware District of Columbia Fully remote Hawaii Iowa Louisiana Mississippi Nebraska New Mexico Rhode Island South Dakota West Virginia

Benefits/Perks

Equitable compensation Fully remote Remote work Supportive team culture

Tasks
  • Build scalable systems
  • Lead incident response
Skills

Analytical Cloud platforms Communication Docker Elixir ELK stack GCP Generative AI Go Grafana IaC Kubernetes Prometheus Terraform Typescript User Experience

Experience

5 years