FreshRemote.Work

Senior Site Reliability Engineer - NYC or Remote

Olo is a leading on-demand commerce platform powering the restaurant industry’s digital transformation and personalizing the guest experience to maximize customer lifetime value. Olo customers can build digital experiences with the largest and most flexible restaurant commerce ecosystem on the market. 
Olo is experiencing tremendous growth and as we enhance our platform to support increased demand, it must be positioned for continued stability, reliability and resiliency. Reporting to the Engineering Manager of Site Reliability, the Site Reliability Engineer will partner with Engineering and Product Managers to learn, improve system availability and sharpen our execution skills to provide an amazing experience for our customers.
You can work remotely from anywhere in the U.S. or at Olo’s headquarters in NYC.

What You’ll Do

  • Guide observability and SLIs/SLOs to Incident Response to postmortems and follow-up actions.
  • Implement and tailor our incident response tools to minimize outage durations.
  • Build collaborative monitoring solutions with members across multiple product teams.
  • Contribute insights across teams to help us improve or re-architect existing systems to support scale, performance and extensibility.
  • Rethink our observability tooling to improve architecture, knowledge models, user experience, performance and stability.
  • Analyze and mature our processes around Incident Response, Observability, Postmortems and Predictive Monitoring.
  • Influence an engineering culture of reliability, observability, and availability.
  • Participate in an Incident Commander on-call rotation to help drive remediation efforts to improve our user experience through incidents across our Platform.
  • Mentor engineering teams through game days, SRE boot camps and other training and feedback channels.

What We’ll Expect From You

  • 3+ years of professional experience building scalable, efficient, and resilient systems.
  • Experience with monitoring tools like Datadog, Sumo Logic, Raygun, New Relic, Grafana, CloudWatch, and Splunk SignalFx.
  • Fluency in Incident …

Hey, this job isn't fresh anymore!

Search Fresh Jobs