FreshRemote.Work

Senior Software Engineer, Embedded Cloud Reliability

San Francisco, CA, United States

About Crunchyroll

WE HELP EVERYONE BELONG. IT’S OUR PURPOSE.

Founded by fans, Crunchyroll delivers the art and culture of anime to a passionate community. We super-serve over 100 million anime and manga fans across 200+ countries and territories, and help them connect with the stories and characters they crave. Whether that experience is online or in-person, streaming video, theatrical, games, merchandise, events and more, it’s powered by the anime content we all love.

Join our team, and help us shape the future of anime!

Who We Are

We're a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our collection of brands.

About the Team

At Crunchyroll, our platforms and infrastructure form the foundation on which our services are built and directly influence our customer experience and velocity of our engineers. The Cloud Reliability team at Crunchyroll embeds with our development teams and partners with our core platform teams to deliver the critical cloud infrastructure that enable our services. You will report into our Senior Manager and this role can be fully remote.

About you

  • 8+ years of experience in building and running high volume customer facing services in highly dynamic environments in Software Engineering, Site Reliability, or related roles. 
  • BS Degree in Computer Science or a related field. 
  • Proficient in at least one programming language (Go, Python, TypeScript) with experience in a software engineering environment. 
  • Experienced in automation, infra as code, and making reusable patterns.
  • Passionate about improving the reliability and performance of critical services through the use of monitoring, metrics, incident management, and proactive engineering. Has helped investigate and remediate critical issues in production services and infrastructure.
  • Expert in observability tools such as DataDog and has hands-on experience instrumenting services for monitoring, logging, metrics collection, tracing.
  • Knowledgeable in performance and load testing tools and methods to simulate production workloads.
  • Acts with urgency, ownership, and with a mindset of continuous improvement. 
  • Able to participate in an on-call rotation to ensure issues are resolved as quickly as possible and prevented from further occurrence. 
  • Experienced in GitOps practices and technologies (CI/CD, Infrastructure as Code, etc). 
This job isn't fresh anymore!
Search Fresh Jobs