Staff Software Engineer, Infrastructure Architecture - AI/ML

San Francisco

DigitalOcean

USD 282K+ Full Time Senior Mid

Company preview All jobs at DigitalOcean

Search Fresh Jobs Job profile

Published 4 months ago

Hey, this job isn't fresh anymore! 👉 Find fresh remote jobs here

Do you ever wonder what happens inside the cloud?

DigitalOcean (NYSE: DOCN) simplifies cloud computing so builders can spend more time creating software that changes the world. With our mission-critical infrastructure and fully managed offerings, DigitalOcean enables startups and small and medium-sized businesses (SMBs) to rapidly deploy and scale modern applications. As a remote-first organization, our employees, like our customers, are based around the world.

We want people who are passionate about designing and operating secure systems at scale.

We are looking for someone passionate about delivering a world class GPU experience for our developer cloud. If you’re an open source advocate familiar with our stack, who enjoys working remotely and is excited about our mission, this role is for you!

At DigitalOcean, we believe in: Creating simple, yet powerful, foundations (with 💕) from which our community can build. The Infrastructure Fleet Organization delivers on this mission by building performant, reliable, modern, efficient, and secure platform foundations for all DigitalOcean products.

Our Stack: C/C++, Python, Go, Linux, libvirt, KVM, QEMU, CEPH

Our Tools: AWX, Chef, Elasticsearch, Git, Github Actions, GSuite, Jira, Nomad, Slack, Victoria Metrics

Our Team: The person filling this position will report to the Sr. Engineering Director of the Infrastructure Fleet Organization (Infra::Fleet). Infra::Fleet is currently composed of 7 teams and is made up of 60 diverse engineers located across the US, Canada, and Europe.

What You’ll Be Doing:

Work with your fellow sharks to design, develop, and optimize the next generation of virtualized GPU infrastructure
Work with customers and stakeholders to define and refine infrastructure requirements needed to support their AI/ML workload
Work with infrastructure technical leaders to define infrastructure requirements to store, move, and manipulate large datasets
Guide performance teams on industry standard testing methodologies and help optimize for GPU fabric throughput
Identify security improvements and drive review discussions with internal teams
Influencing a culture of engineering excellence through active engagement with DigitalOcean’s Architecture group
Working directly with individual engineering teams to deliver new infrastructure functions and technologies in support of DigitalOcean AI/ML products
Drive technical strategy that influences medium and long term roadmaps
5-20% of your time is spent contributing to open source communities related to our stack and encouraging your fellow sharks to do the same