Site Reliability Engineer
United States (Remote)
One’s mission is simple - to help customers achieve financial progress. We’re doing this by creating simple solutions to help our customers save, spend, borrow, and grow their money – all in one place.
The U.S. consumer today deserves better. Millions of Americans today can’t access credit, build savings or wealth, and are left to manage their financial lives through multiple disconnected apps. Almost a quarter of U.S. adults are unbanked or underbanked and roughly 80% of fintech users rely on multiple accounts to manage their finances.
What makes us unique? We are backed by a preeminent fintech investor (Ribbit) and the world’s largest retailer (Walmart), maintain the speed and independence of a startup, and employ a strong (and growing) collection of world-class talent.
There’s never been a better moment to build a business that helps people achieve financial progress. Come build with us!
The role
As a Site Reliability Engineer (SRE) at One, your mandate is to ensure the availability and reliability of our most critical services, and ensure that they meet the requirements of our customers. Our SRE team at One is growing, so you’ll be a crucial early member to help establish the team, processes, and best practices. Success in this role looks like collaborating with other teams to build and run sustainable production systems that can evolve and adapt to the changes in our fast-paced environment.
This role is responsible for:
Working proactively with engineering teams to help them set SLOs and implement best practices for logging and telemetry collection
Design, implement and maintain the tools and systems that support service reliability, monitoring, and alerting
Participating in a 24x7 on-call rotation supporting the health of our services
Driving the incident management process and support a blameless post-mortem culture
Participating in application design consulting and capacity planning
Defining and formalizing SRE practices and help guide the overall reliability engineering direction
Providing mentorship both formally and informally to engineers at One
Continuously optimizing systems and workflows by improving architecture, infrastructure, automation, CI/CD, and observability
Combining software and systems knowledge to engineer high-volume distributed systems in a reliable, scalable, and fault-tolerant manner
You bring
5+ years of relevant industry experience with a focus on distributed cloud native systems design, observability, operation, maintenance, and troubleshooting
5+ years operational experience with an observability platform like Datadog, Splunk, Prometheus/Grafana, or AppDynamics
Fluency in one or more programming languages (e.g. Python, …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Remote
Benefits/PerksCompetitive cash Competitive cash benefits Early access Early access to a high potential, high growth fintech Early access to fintech Effective on day one Flat titling structure Flexible time off Flexible time off programs Generous stock option packages Generous stock options Office friendly Other available benefits Pay Transparency Remote Friendly
Tasks- Collaborate with engineering teams
- Ensure service availability
- Implement best practices
- Mentor engineers
- Support on call rotation
Alerting Automated Testing Automation Capacity planning CI/CD Cloud Native Systems Datadog Design Distributed Systems Engineering Fintech Go Incident Management Infrastructure Mentorship Monitoring Observability Product Management Python Site Reliability Engineering Typescript Version Control
Experience5 years
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9