FreshRemote.Work

Lead Site Reliability Engineer

Remote

Hims & Hers Health, Inc. (better known as Hims & Hers) is the leading health and wellness platform, on a mission to help the world feel great through the power of better health. We are revolutionizing telehealth for providers and their patients alike. Making personalized solutions accessible is of paramount importance to Hims & Hers and we are focused on continued innovation in this space. Hims & Hers offers nonprescription products and access to highly personalized prescription solutions for a variety of conditions related to mental health, sexual health, hair care, skincare, heart health, and more.

Hims & Hers is a public company, traded on the NYSE under the ticker symbol “HIMS”. To learn more about the brand and offerings, you can visit hims.com and forhers.com, or visit our investor site. For information on the company’s outstanding benefits, culture, and its talent-first flexible/remote work approach, see below and visit www.hims.com/careers-professionals.

About the Role:

We are seeking a Lead Site Reliability Engineer to help build a reliable web experience for our users. We believe that moving fast is our competitive advantage, and enables us to better serve our users. We also know that the faster we move, the more likely we are to break things.

You Will:

  • Design and implement SRE practices ensuring availability, scalability and observability of production systems with a strong focus on excellent customer experience
  • Actively seek and identify opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation
  • Use automation extensively to design, configure, manage, and monitor systems in support of our product development teams
  • Understanding of Infrastructure and infra automation (Infrastructure as Code)
  • Manage incidents and emergency response, track outages, ensure data integrity and engineer releases to promote safe, efficient and rapid deployments
  • Handle emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed
  • Improve the codebase by resolving logic issues, deprecating unused code, etc.
  • Implement monitoring, logging, alerting and SLO Reporting
  • Identify Service Level Indicators (SLIs) that will align the team to meet the availability and performance objectives
  • Perform and run blameless RCAs on incidents and outages aggressively looking for answers that will prevent incident reoccurrence
  • Provides reviews on design documents from internal and external teams
  • Performs more-complex tasks using …
This job isn't fresh anymore!
Search Fresh Jobs