Staff Software Engineer, Infrastructure Architecture - AI/ML
Denver
Do you ever wonder what happens inside the cloud?
DigitalOcean (NYSE: DOCN) simplifies cloud computing so builders can spend more time creating software that changes the world. With our mission-critical infrastructure and fully managed offerings, DigitalOcean enables startups and small and medium-sized businesses (SMBs) to rapidly deploy and scale modern applications. As a remote-first organization, our employees, like our customers, are based around the world.
We want people who are passionate about designing and operating secure systems at scale.
We are looking for someone passionate about delivering a world class GPU experience for our developer cloud. If youâre an open source advocate familiar with our stack, who enjoys working remotely and is excited about our mission, this role is for you!
At DigitalOcean, we believe in: Creating simple, yet powerful, foundations (with đ) from which our community can build. The Infrastructure Fleet Organization delivers on this mission by building performant, reliable, modern, efficient, and secure platform foundations for all DigitalOcean products.
Our Stack: C/C++, Python, Go, Linux, libvirt, KVM, QEMU, CEPH
Our Tools: AWX, Chef, Elasticsearch, Git, Github Actions, GSuite, Jira, Nomad, Slack, Victoria Metrics
Our Team: The person filling this position will report to the Sr. Engineering Director of the Infrastructure Fleet Organization (Infra::Fleet). Infra::Fleet is currently composed of 7 teams and is made up of 60 diverse engineers located across the US, Canada, and Europe. Â
What Youâll Be Doing:
- Work with your fellow sharks to design, develop, and optimize the next generation of virtualized GPU infrastructure
- Work with customers and stakeholders to define and refine infrastructure requirements needed to support their AI/ML workload
- Work with infrastructure technical leaders to define infrastructure requirements to store, move, and manipulate large datasets
- Guide performance teams on industry standard testing methodologies and help optimize for GPU fabric throughput  Â
- Identify security improvements and drive review discussions with internal teams
- Influencing a culture of engineering excellence through active engagement with DigitalOceanâs Architecture group
- Working directly with individual engineering teams to deliver new infrastructure functions and technologies in support of DigitalOcean AI/ML products
- Drive technical strategy that influences medium and long term roadmaps
- 5-20% of your time is spent contributing to open source communities related to our stack and encouraging your fellow sharks to do the same
What Weâll Expect From You:
- Experience âŚ
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Remote-first organization
Benefits/PerksCommunity engagement Cutting-edge technology Cutting-edge technology environment Employee Stock Purchase Program Equity Compensation Remote work
Tasks- Drive technical strategy
AI/ML C C++ CEPH Chef ElasticSearch Git GitHub GitHub Actions Go G Suite Jira KVM Linux Networking Nomad Python Slack Virtualization
Experience5 years
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9