FreshRemote.Work

Site Reliability Engineer

Mountain View - United States or Remote - Mountain View, California 94041 United States; Remote - Remote

Overview

Love staying ahead of the growth curve and experimenting with new software and environments? Get on board as an Atlassian Site Reliability Engineer.

Responsibilities

As a Site Reliability Engineer (SRE) you will actively work to improve the performance and reliability of services as well as address root causes of incidents and reduce incident rates

You will deep dive into the services we support and own the problem and the corresponding solution, as well as automating away repetitive work.

You'll also respond to pings, pages, and alerts to investigate issues in our systems that you can really sink your teeth into.

The best person for this role is someone who has a collaborative spirit - in our world, it’s not about being a hero and having all the answers, it’s about sometimes saying "I don't know" and working on finding solutions rather than starting with an assumption.

The team needs someone who can ask questions, learn from others and turn chaos into order. You will serve in an on-call weekly rotation to make sure our products meet established SLAs.

This role would be a great fit for someone with creative and innovative problem solving skills with a willingness to take responsibility for the code you write all the way to production. You will develop and implement solutions that operate at scale - seeing your own technology efforts directly improve the reliability of our services. Our teams are empowered and expected to improve our products to truly deliver a reliable experience to customers. You will own development efforts in each and every sprint from planning to delivery to realise this goal and collaborate with different team members to review code.

One thing we promise: you’ll never be bored.

Qualifications

On your first day, you will have experience in:

  • Writing code in Bash and Python

  • Triaging and diagnosing user facing service outages

  • Engage in capacity planning, demand forecasting, software performance analysis, and systems tuning.

  • Experience configuring and managing enterprise monitoring solutions

  • Understanding of Linux systems 

  • Building, automating, and maintaining infrastructure in Amazon Web Services

  • Maintaining a high standard of code quality

We'd be super excited if you have:

  • Exposure to and maintenance of configuration management and orchestration tools such as Ansible and Puppet

  • Experience with container management and microservices architectures such as Docker and Kubernetes

  • Understanding of ITIL terminology for incident and problem management

  • Management and troubleshooting of a continuous integration pipeline

This job isn't fresh anymore!
Search Fresh Jobs