Staff Site Reliability Engineer - AWS/EKS
United States - Remote
About Us:
SentinelOne is defining the future of cybersecurity through our XDR platform that automatically prevents, detects, and responds to threats in real-time. Singularity XDR ingests data and leverages our patented AI models to deliver autonomous protection. With SentinelOne, organizations gain full transparency into everything happening across the network at machine speed – to defeat every attack, at every stage of the threat lifecycle.
We are a values-driven team where names are known, results are rewarded, and friendships are formed. Trust, accountability, relentlessness, ingenuity, and OneSentinel define the pillars of our collaborative and unified global culture. We're looking for people that will drive team success and collaboration across SentinelOne. If you’re enthusiastic about innovative approaches to problem-solving, we would love to speak with you about joining our team!
Please note that under Federal & FedRAMP regulations hiring for this role is limited to US citizens only.
FedRamp Staff may be subject to customer or third party background checks up to and including secret clearance if required by their role at SentinelOne.
What Are We Looking For?
We are looking for an experienced SRE, well-versed in large-scale SaaS or cloud engineering environments. As a Site Reliability Engineer, your primary responsibility will be the stability, reliability, and scalability of SentinelOne’s products and services. In this job, you will have an opportunity to help design, implement, and maintain robust infrastructure, complex distributed systems and related areas such as monitoring and automation. Someone who has driven continuous deployment, has provided engineering leadership and expertise for complex incidents and corresponding post-incident reviews, has provided feedback to development teams on architecture decisions, and has automated repetitive operational tasks would be a great fit.
What Will You Do?
- Support the stability, reliability, and scalability of SentinelOne’s distributed systems through various tasks performed by the Site Reliability Engineering organization including managing Kubernetes, creating IaC, and leading troubleshooting during incident response
- Identify areas, such as performance issues and availability concerns, as well as perform other technical and architectural reviews to partner with fellow engineering teams to improve overall reliability of SentinelOne systems
- Design and implement comprehensive monitoring and alerting, as well as concepts such as SLIs/SLOs and critical user journeys to provide deeper insight into the performance and availability of SentinelOne’s systems
- Analyze systems, identify toil, and develop and implement strategies such as automation to streamline …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
US Citizens only
Benefits/PerksCell phone reimbursement Commuter Company-sponsored events Cutting-edge company Dental Dependent FSA Disability and life Disability and Life Insurance Employee Assistance Program Employee Stock Purchase Employee Stock Purchase Program Extraordinary challenges Gender-neutral parental leave Gym Membership Reimbursement Health and Dependent FSA Insurance Life Insurance Medical Medical, Vision, Dental Paid company holidays Paid sick time Parental leave Sick time Stock Purchase Program Unlimited PTO Vision
Tasks- Automate operational tasks
AI AI models ArgoCD Automation AWS CI CI/CD Cloud Engineering Cloud-native Services Communication Cybersecurity Distributed Systems EKS FedRAMP GCP Golang IaC Incident Response Infrastructure as Code Java Javascript Jenkins Kubernetes Leadership Mesos Monitoring Nomad Problem-solving Python Recruiting Ruby SaaS Scripting Site Reliability Engineering SRE Technical XDR XDR platform
Experience7 years
Certifications TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9