FreshRemote.Work

Senior Site Reliability Engineer

Remote

Patreon is the best place for creators to fire up their fandoms, share exclusive work, and turn their passions into lasting creative businesses. Over 250,000 podcasters, writers, musicians, artists, and other creative people use Patreon to reach their biggest fans directly and earn an income for the value they provide. Creators can offer paid memberships that unlock access to exclusive work and community, or sell individual digital items from their own Patreon shops. 

Ultimately, our goal is simple: fund the creative class. And we’re leaders in that space, having sent over $3.5 billion to creators since our founding. We’re continuing to invest heavily in building the best creator tools with the best team in the creator economy, and are looking for a Site Reliability Engineer to support our mission.

This role is remote with optional in-person attendance in either the San Francisco or New York offices. Expect to travel a handful of times per year for team building and collaboration offsites. 

About the Role

  • Contribute to high impact AWS cloud infrastructure initiatives to improve the performance, reliability, and cost efficiency of Patreon’s rapidly growing platform.

  • Participate in operability reviews and production readiness reviews to ensure the scalability, resiliency, and operability of new and existing product features.

  • Advocate and implement Site Reliability Engineering practices including SLIs, SLOs, and SLAs across the engineering organization to improve our operational excellence.

  • Enhance the feature set of our new kubernetes developer platform and work with partner teams to migrate their workloads over to it. 

  • Provide a delightful and automated experience to our constituent teams by developing tooling and automation to facilitate self service for routine tasks.

  • Support and maintain critical infrastructure components including our infrastructure as code project, centralized observability stack, and Cloudflare edge.

About You

  • You have experience in DevOps, Site Reliability, or backend/infrastructure engineering for a company experiencing fast-paced growth.

  • You are proficient with a programming language like Python and shell scripting.

  • You have hands on experience implementing Site Reliability Engineering practices (SLIs, SLOs, SLAs) and like using metrics to make data based decisions

  • You are knowledgeable in configuration management with a framework such as Terraform, Ansible, Chef, or Puppet.

  • You've worked with continuous integration and deployment systems, and have ideas about how to build and improve those systems.

  • Your documentation and verbal communication skills are excellent, and you're able to collaborate and rally support with people on and off your team.

  • You …

This job isn't fresh anymore!
Search Fresh Jobs