FreshRemote.Work

Senior Software Engineer, SRE Tools & Telemetry

Remote, CA, US, USA 519431

Job ID: 262965
Location Name: FSC REMOTE SF/NY/DC -173(USA_0173)
Address: FSC, Remote, CA 94105, United States (US)
Job Type: Full Time
Position Type: Regular
Job Function: Information Technology
Remote Eligible:Remote

 

Company Overview:

At Sephora we inspire our customers, empower our teams, and help them become the best versions of themselves. We create an environment where people are valued, and differences are celebrated. Every day, our teams across the world bring to life our purpose: to expand the way the world sees beauty by empowering the Extra Ordinary in each of us. We are united by a common goal - to reimagine the future of beauty.

 

The Opportunity:

Technology

Our technology team works fast and smart. With San Francisco as our home, we take bringing new tech to market seriously, developing the latest in mobile technologies, scalable architecture, and the coolest in-store client experience. We love what we do, and we have fun doing it. The Technology group is comprised of motivated self-starters and true team players that are integral to the growth of Sephora and our future success. SRE Tools & Telemetry team is at the forefront of observability and automation, ensuring seamless reliability and performance of our cloud-native infrastructure and applications. If you're passionate about improving system reliability through automation and scalable tools, join us to make a significant impact.

 

Your role at Sephora:

As a Senior Software Engineer in the SRE Tools & Telemetry team, you will design, build, and maintain robust observability solutions across cloud and on-prem infrastructure. You will work closely with SRE, DevOps, and software engineering teams to optimize performance monitoring, logging, and incident response automation. Your expertise in observability tooling, cloud platforms, and automation will drive efficiency and reliability improvements across the organization.

 

Responsibilities:

  • Design, develop, and enhance monitoring, logging, and tracing solutions using tools like Splunk, Prometheus, Grafana, OpenTelemetry, and Dynatrace.
  • Develop automated alerting and self-healing capabilities leveraging Terraform, Ansible, and Kubernetes-native solutions.
  • Enhance the performance and scalability of telemetry pipelines to handle large-scale distributed systems.
  • Partner with SRE and engineering teams to define SLAs, SLOs, and error budgets; drive incident management improvements.
  • Implement secure logging, audit trails, and anomaly detection mechanisms to maintain compliance and security best practices.

 

We're excited about you if you have:

This job isn't fresh anymore!
Search Fresh Jobs