Senior Software Engineer, Embedded Cloud Reliability
San Francisco, CA, United States
About Crunchyroll
WE HELP EVERYONE BELONG. IT’S OUR PURPOSE.
Founded by fans, Crunchyroll delivers the art and culture of anime to a passionate community. We super-serve over 100 million anime and manga fans across 200+ countries and territories, and help them connect with the stories and characters they crave. Whether that experience is online or in-person, streaming video, theatrical, games, merchandise, events and more, it’s powered by the anime content we all love.
Join our team, and help us shape the future of anime!
Who We Are
We're a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our collection of brands.
About the Team
At Crunchyroll, our platforms and infrastructure form the foundation on which our services are built and directly influence our customer experience and velocity of our engineers. The Cloud Reliability team at Crunchyroll embeds with our development teams and partners with our core platform teams to deliver the critical cloud infrastructure that enable our services. You will report into our Senior Manager and this role can be fully remote.
About you
- 8+ years of experience in building and running high volume customer facing services in highly dynamic environments in Software Engineering, Site Reliability, or related roles.
- BS Degree in Computer Science or a related field.
- Proficient in at least one programming language (Go, Python, TypeScript) with experience in a software engineering environment.
- Experienced in automation, infra as code, and making reusable patterns.
- Passionate about improving the reliability and performance of critical services through the use of monitoring, metrics, incident management, and proactive engineering. Has helped investigate and remediate critical issues in production services and infrastructure.
- Expert in observability tools such as DataDog and has hands-on experience instrumenting services for monitoring, logging, metrics collection, tracing.
- Knowledgeable in performance and load testing tools and methods to simulate production workloads.
- Acts with urgency, ownership, and with a mindset of continuous improvement.
- Able to participate in an on-call rotation to ensure issues are resolved as quickly as possible and prevented from further occurrence.
- Experienced in GitOps practices and technologies (CI/CD, Infrastructure as Code, etc).
- …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Fully remote
Benefits/PerksCommuter benefit Fully remote Great compensation package Pet Insurance
SkillsAutomation AWS CI/CD Datadog DynamoDB EC2 ECS EKS ElasticSearch GCP GitOps GKE Go Incident Management Infrastructure as Code Lambda Load Testing Metrics Monitoring Observability tools Performance Testing Python RDS Relational databases S3 Site Reliability Software Engineering SQS Typescript
Experience8 years
EducationB.S. in Computer Science Engineering Related Field
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9