Site Reliability Engineer

Remote - US East Coast

GitHub logo
GitHub
Apply now Apply later

Posted 1 week ago

GitHub is seeking software engineering professionals to join its new SRE team. As a valued member of our close-knit team, you will bring your passion for building fault tolerant systems and reliable software to help us steward reliability as a feature throughout the organization. Your work will help us scale the world's largest code hosting platform.

Our charter is broad but our focus is to improve the availability, resilience, and sustainability of GitHub's products. We do this through architecture, technology, process, and partnerships with product teams.

Our SRE team is highly distributed; our work environment is one of remote work, asynchronous communication, trust, and respect. Through your strong written communication and software skills, you will develop meaningful working relationships with coworkers from around the globe.

The SRE role at GitHub is an opportunity to blend your system design, empathy, and software engineering skills on an ever-changing set of novel reliability challenges. Join us on this journey and have a meaningful impact on how the world builds software.

Responsibilities:

  • Exert technical influence to improve the reliability of our products and systems
  • Develop and maintain infrastructure products and software automation
  • Integrate with third-party solutions where it makes the most sense.
  • Work closely with our observability and chaos engineering teams.
  • Cultivate GitHub's open source projects and build things you are proud to share.
  • Steward reliability as a feature across the organization through concepts such as SLOs and service maturity.

Minimum Qualifications:

  • Comfort with the GNU/Linux operating system.
  • Experience with distributed systems with high availability requirements.
  • Exposure to system-level languages such as Go or C/C++.
  • Familiarity with configuration management software such as Puppet, Ansible, or Salt.
  • Familiarity with infrastucture services and sidecar patterns.
  • Experience balancing the service reliability, sustainability, and technical debt for services running at scale.

Preferred Qualifications:

  • Experience with highly available systems at scale.
  • Experience building infrastructure and automation.
  • Experience negotiating SLIs, SLOs, and SLAs with product owners.
  • Success in a remote work environment.
  • Incident response and/or incident management experience.
  • Exposure to CNCF projects such as Kubernetes or Prometheus.

Who We Are:

GitHub is the developer company. We make it easier for developers to be developers: to work together, to solve challenging problems, and to create the world’s most important technologies. We foster a collaborative community that can come together—as individuals and in teams—to create the future of software and make a difference in the world.

Leadership Principles:

Customer Obsessed - Trust by Default - Ship to Learn - Own the Outcome - Growth Mindset - Global Product, Global Team - Anything is Possible - Practice Kindness

Why You Should Join:

At GitHub, we constantly strive to create an environment that allows our employees (Hubbers) to do the best work of their lives. We've designed one of the coolest workspaces in San Francisco (HQ), where many Hubbers work, snack, and create daily. The rest of our Hubbers work remotely around the globe. Check out an updated list of where we can hire here: https://github.com/about/careers/remote

We are also committed to keeping Hubbers healthy, motivated, focused and creative. We've designed our top-notch benefits program with these goals in mind. In a nutshell, we've built a place where we truly love working, we think you will too.

GitHub is made up of people from a wide variety of backgrounds and lifestyles. We embrace diversity and invite applications from people of all walks of life. We don't discriminate against employees or applicants based on gender identity or expression, sexual orientation, race, religion, age, national origin, citizenship, disability, pregnancy status, veteran status, or any other differences. Also, if you have a disability, please let us know if there's any way we can make the interview process better for you; we're happy to accommodate!

Please note that benefits vary by country. If you have any questions, please don't hesitate to ask your Talent Partner.

#LI-POST

Job tags: Ansible Distributed systems Kubernetes Linux Open Source Puppet SRE