Incident Manager - Distributed US
Remote, United States
Lots of tech companies disrupt. But, many fail when they try to scale. We're different. CockroachDB makes it easier for companies to build and scale apps. This is how and why we're helping some of the most innovative companies on the planet. We tackle problems head-on and focus on solutions that create lasting impact.
Because when our customers win, we all win.
The Role
As an Incident Manager at Cockroach Labs, you will oversee the resolution of all types of incidents across internal, hosted cloud, on-premises customer environments, and security/compliance areas. Your responsibilities will include owning incident escalations, documenting processes, maintaining clear communication with customers and stakeholders, and collaborating with cross-functional teams to identify root causes and implement strategies to prevent future incidents. As the founding Incident Manager, you will play a crucial role in shaping the future of Incident Management at Cockroach Labs. You will:
- Manage the full lifecycle of incidents from identification through resolution, ensuring adherence to established incident management protocols across various mediums including cloud-hosted and fleet-wide incidents, customer-hosted cluster incidents, and security incidents.
- Lead and coordinate response efforts across various teams to ensure timely and effective incident resolution.
- Act as an escalation point for critical incidents and assist in leading crisis response processes as required.
- Drive root cause investigations for high impact/high visibility issues.
- Manage communications tailored to both technical and non-technical audiences, including internal and external, customer-facing stakeholders, about incident status, impact, and resolution progress.
- Conduct post-incident reviews with cross-functional teams, identifying actionable insights and process optimizations.
- Monitor, evaluate, and report on incident management programs, identifying trends and areas for improvement.
- Assist in the design and implementation of new processes and procedures to handle business growth and maturation.
- Provide rotational on-call support (24x7x365) to ensure incidents are handled promptly and effectively.
The Expectations
In your first 30 days, you will familiarize yourself with CockroachDB, our customers, and our company. We will provide some self-guided onboarding with reading and hands-on material to familiarize yourself with the company and some of the responsibilities of the role. During this period, you will also start to get acquainted with our incident management protocols and tools, and begin shadowing incident response activities to observe and learn from other team members with an eye to future improvements and optimizations.
…This job isn't fresh anymore!
Search Fresh JobsJob Profile
Career development opportunities Dental Insurance Flexible hours Flexible time off Hybrid work Hybrid work model Life and Disability insurance Medical Insurance Mental wellbeing benefits Paid holidays Paid parental leave Paid Sick Days Professional development Professional development funds Remote work Stock options Vision Insurance
Tasks- Conduct post-incident reviews
Cockroachdb Communication Cross-functional Collaboration Documentation Incident Management Problem-solving Process Optimization Python Root Cause Analysis Troubleshooting
Experience7 years
EducationBachelor's degree Computer Science Information Technology
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9