Staff Site Reliability Engineer, Infrastructure, Observability
US Remote
We're Cruise, a self-driving service designed for the cities we love.
We’re building the world’s most advanced self-driving vehicles to safely connect people to the places, things, and experiences they care about. We believe self-driving vehicles will help save lives, reshape cities, give back time in transit, and restore freedom of movement for many.
In our cars, you’re free to be yourself. It’s the same here at Cruise. We’re creating a culture that values the experiences and contributions of all of the unique individuals who collectively make up Cruise, so that every employee can do their best work.
Cruise is committed to building a diverse, equitable, and inclusive environment, both in our workplace and in our products. If you are looking to play a part in making a positive impact in the world by advancing the revolutionary work of self-driving cars, come join us. Even if you might not meet every requirement, we strongly encourage you to apply. You might just be the right candidate for us.
The Observability team at Cruise is looking for a Staff Site Reliability Engineer to play a critical role in building out and improving observability systems, tools and the related codebase.
Site Reliability Engineers at Cruise bring specialized knowledge and experience to ensure the reliability, scalability, performance, efficiency, and security of our systems.
What you'll be doing:
Using your software and systems engineering skills to contribute code, perform code reviews, and create technical designs that improve performance and reliability of observability systems.
Proactively identify and address challenges that create new opportunities to improve the state of engineering through observability.
Partnering with Software Engineering teams to better understand use-cases and guide the engineers to use the existing tools effectively.
Building tools to enable engineers to collect and act on observability signals.
What you must have:
Previous experience as an SRE, Production Engineer, Systems Engineer, or Software Engineer with a focus on distributed systems reliability.
Proficient in designing and developing sophisticated distributed systems, with expertise in one or more high-level programming languages such as Go, Python, Rust, C/C++, or NodeJS.
Experience in implementing a new technology or service by leading or driving a multi-functional effort.
Experience in designing and implementing large scale systems.
Considerable Linux experience.
Effective collaboration skills to work closely with the team members and various engineering teams.
Bonus Points!
Experience with Cloud Platforms such as Amazon Web Services (AWS), Microsoft …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
US Remote
Benefits/PerksBonus Commuter benefit plan Competitive salary and benefits Dependent Care Flexible Spending Account Fertility Benefits Integrated manufacturing Life and AD&D Location-flexible work policy Long-term incentives Medical, Dental, Vision Paid parental, medical, family care, and military leave Paid parental, medical, family care, and military leave of absence Paid Time Off Paid time off and holidays Perks Wallet program Pre-tax Commuter benefit plan Subsidized mental health Subsidized mental health benefits
Tasks- Perform code reviews
Amazon Web Services C C/C++ Cloud platforms Distributed Systems Docker Go Google Cloud Platform Grafana Istio Kubernetes Microsoft Azure Node.js OpenTelemetry Prometheus Python Rust Terraform Web Services
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9