Incident Response Manager
Remote
Who we are
About Stripe
Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.
About the team
The Incident Ops team is a global 24/7 team responsible for driving incident response and management from detection to resolution. Stripe is proud of its five 9s reliability and this team is at the forefront of ensuring we keep it that way - working hand-in-hand with Reliability Eng and across the Tech Org. This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution - marshaling the necessary cross-functional resources to respond to and resolve service outages, critical bugs, security attacks and anything that significantly impacts the users of our products. The team is user-first and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe. The team is skilled in communications, incident handling and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.
What you’ll do
As an Incident Response Manager (IRM), you’ll play the key role in driving the right level of response from Stripes to incidents, determining impact, rallying Stripes to mitigate, communicating to users and ensuring appropriate remediations and orchestrate the Root Cause Analysis (RCA) process. You’ll work hand-in-hand with IRMs and engineers globally to ensure solid 24/7 coverage on how we monitor, detect, respond, communicate and mitigate incidents. When not managing incidents, you'll help scale our ability to respond to incidents, improve our operations, analyze data to provide insights and deepen our technical expertise in products. As a result, you’ll be seen as the protector of our users - in minimizing the impact of incidents on their business and ensuring that Stripe is always thinking of our users.
Responsibilities
- Act as an on-call Incident Commander, responsible for driving and managing incident resolution with a high level of urgency, cross-functional collaboration, and accuracy, while partnering with a global and diverse set of teams, including Engineering, Product, Policy, Risks, PR, Legal, Execs, etc.
- Lead all user-facing incidents across domains at Stripe - including reliability, technical, security, and data privacy
- "User First" approach to determine impact, providing accurate situation reports, facilitating comms bridges, and ensuring useful and timely external communications to users
- Proactively update internal stakeholders, make decisions through data and influence by partnering with Engineering, Sales, Support and other cross-functional teams
- Contribute to the root cause analysis process while conducting post-mortems, remediations identification, and ensure problem management tasks meet SLA and user expectations
- Drive improvements in the incident handling process and incident management metrics and tooling based on trends and data of Stripe's incidents in collaboration with engineering, product and operations teams
- Collaborate closely with leadership for building team strategy based on the team vision
- Collaborate and coach other Incident Response Managers on the team
Who you are
We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.
Minimum requirements
- 5+ years of demonstrable major incident experience for organizations that run mission critical applications or always-on Saas environments.
- Demonstrated ability to lead multiple incidents concurrently with authority and influence responders with agency and reasoning skills to resolve ambiguous problems and drive to root cause.
- Strong full stack technical skills with development/support experience with cloud based technologies
- Demonstrated experience developing code and automation using Python, Ruby, JavaScript or shell scripting.
- Solid understanding of infrastructure, including physical, virtual, and container-based compute platforms
- Strong quantitative, and analytical skills in data manipulation using SQL, Splunk or other tools.
- Excellent task management skills, must be detail-oriented with ability to remain composed, methodical, and think fast in a high-pressured environment.
- Exceptional written and verbal English communication skills, with the ability to translate complex technical issues for internal and external stakeholders
Preferred qualifications
- Domain expertise in classes of incidents such as technical, privacy, security or crisis with a strong desire to continuously learn about Stripe's products, technical issues and systems.
- Ability to review complex technical details regarding ongoing issues/events and convey the key details to senior stakeholders to facilitate real-time decision making.
- Experience with broad user-facing communications (e.g. status pages, tweets) and/or targeted communications (e.g. direct emails, support ticket responses).
- Familiarity operating or managing distributed architectures with the ability to correlate system behaviors based on known inter-dependencies.
- Demonstrated experience with full stack development and support
Working remotely at Stripe
A remote location, in most cases, is defined as being 35 miles (56 kilometers) or more from one of our offices. While you would be welcome to come into the office for team/business meetings, on-sites, meet-ups, and events, our expectation is you would regularly work from home rather than a Stripe office. Stripe does not cover the cost of relocating to a remote location. We encourage you to apply for roles that match the location where you currently or plan to live.Pay and benefits
The annual US base salary range for this role is $180,200 - $270,300. For sales roles, the range provided is the role’s On Target Earnings ("OTE") range, meaning that the range includes both the sales commissions/sales bonuses target and annual base salary for the role. This salary range may be inclusive of several career levels at Stripe and will be narrowed during the interview process based on a number of factors, including the candidate’s experience, qualifications, and location. Applicants interested in this role and who are not located in the US may request the annual salary range for their location during the interview process.
Additional benefits for this role may include: equity, company bonus or sales commissions/bonuses; 401(k) plan; medical, dental, and vision benefits; and wellness stipends.
We look forward to hearing from you
At Stripe, we're looking for people with passion, grit, and integrity. You're encouraged to apply even if your experience doesn't precisely match the job description. Your skills and passion will stand out—and set you apart—especially if your career has taken some extraordinary twists and turns. At Stripe, we welcome diverse perspectives and people who think rigorously and aren't afraid to challenge assumptions. Join us. ApplyJob Profile
Benefits/PerksCompany bonus Company bonus or sales commissions Company bonus or sales commissions/bonuses Equity Global team collaboration Medical, dental, and vision benefits Professional development opportunities Remote-first company Remote work Vision Benefits Wellness stipends
Tasks- Conduct root cause analysis
- Coordinate cross-functional teams
- Improve incident handling processes
- Manage incident resolution
Analytical Automation Business Collaboration Communication Communications Cross-functional Collaboration Data analysis Data privacy Engineering Financial Infrastructure Incident Handling Incident Management Incident Resolution Incident Response Infrastructure Javascript Leadership Legal Management Operational improvement Operations Payments Python Reliability Root Cause Analysis Ruby SaaS Sales Security Splunk SQL Strategy Technical Technical communication Technical Expertise User Experience Focus
Experience3 years
Education