Staff Site Reliability Engineer - Incident Response
Remote - Washington, USA
About Zscaler
Serving thousands of enterprise customers around the world including 40% of Fortune 500 companies, Zscaler (NASDAQ: ZS) was founded in 2007 with a mission to make the cloud a safe place to do business and a more enjoyable experience for enterprise users. As the operator of the world’s largest security cloud, Zscaler accelerates digital transformation so enterprises can be more agile, efficient, resilient, and secure. The pioneering, AI-powered Zscaler Zero Trust Exchange™ platform protects thousands of enterprise customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location.
Named a Best Workplace in Technology by Fortune and others, Zscaler fosters an inclusive and supportive culture that is home to some of the brightest minds in the industry. If you thrive in an environment that is fast-paced and collaborative, and you are passionate about building and innovating for the greater good, come make your next move with Zscaler.
Our Engineering team built the world's largest cloud security platform from the ground up, and we keep building. With more than 100 patents and big plans for enhancing services and increasing our global footprint, the team has made us and our multitenant architecture today's cloud security leader, with more than 15 million users in 185 countries. Bring your vision and passion to our team of cloud architects, software engineers, security experts, and more who are enabling organizations worldwide to harness speed and agility with a cloud-first strategy.
NOTE: U.S. citizenship is required for this position due to the nature of the customers assigned to this role
We're looking for an experienced Staff Site Reliability Engineer-Incident Response to join our Shared Platform Engineer team. Reporting to the Director Cloud Operations and Incident Management, you'll be responsible for:
- Lead and advocate for the transformation to a world-leading SRE organization, promoting SRE principles within the Engineering Department.
- Provide expert leadership during critical outages, coordinating multiple teams to ensure streamlined decision-making and quick resolution.
- Promote a customer-focused approach by addressing and mitigating global customer environment issues, and fostering a culture of continuous learning and technical excellence within the SRE team.
- Develop and implement scalable process frameworks and observability strategies to ensure rapid problem diagnosis, response, and service reliability.
- Collaborate with product teams to thoroughly analyze failures and integrate insights to improve service reliability, scalability, and operational efficiency.
What …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
U.S. citizenship required
Benefits/PerksContinuous learning Education reimbursement Equity Health plans Inclusive benefits Inclusive culture In-office perks In-office perks, and more Parental leave Parental leave options Remote work Remote work option Retirement options Supportive culture Time off Time off plans for vacation Various health plans
Tasks- Collaborate with product teams
Agile AI Automation BGP Cloud operations Cloud Security Datadog Digital Transformation Engineering FedRAMP Grafana Incident Management Incident Response IPsec Leadership Linux Management Networking Observability Operational Efficiency Python Recruiting Reporting Scripting Security Site Reliability Engineering Software Splunk SRE SSL SSL/TLS Support Systems Engineering TCP/IP Tools Training Transformation
Experience5 years
EducationBachelor's Bachelor's degree Bachelor's degree in Computer Science Business Computer Science Engineering Equivalent Technical field
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9