FreshRemote.Work

Principal Software Engineer, Architecture (AI/ML)

Austin

Do you ever wonder what happens inside the cloud?

DigitalOcean (NYSE: DOCN) simplifies cloud computing so builders can spend more time creating software that changes the world. With our mission-critical infrastructure and fully managed offerings, DigitalOcean enables startups and small and medium-sized businesses (SMBs) to rapidly deploy and scale modern applications. As a remote-first organization, our employees, like our customers, are based around the world.

We want people who are passionate about staying on top of the latest cloud infrastructure and AI/ML trends, with an excellent aptitude for supporting internal employees and teams.

We are looking for a highly experienced, highly motivated Principal Software Engineer, Architecture (AI/ML) with a Computer Science, Engineering, or AI/ML background. You will be involved in the architecture, design, implementation, verification, and integration of the next generation of DigitalOcean Cloud Computing software with a strong emphasis on AI/ML-driven solutions.

What You’ll Be Doing:

  • Working at the forefront of cloud, distributed computing, and AI/ML technologies.
  • Serving as the architect driving the technical strategy and direction for our large-scale cloud services, including machine learning model deployment and orchestration.
  • Developing AI/ML models to optimize cloud infrastructure, improve system reliability, and enhance user experience.
  • Building and refining machine learning pipelines and frameworks to support scalable AI/ML solutions.
  • Owning the primary responsibility for establishing a pragmatic long-term technical direction for our software services, ensuring alignment with our customers, business goals, and internal teams.
  • Leading a team of highly passionate technical leads to evolve our service architecture, with alignment across several product technical roadmaps.
  • Leading by example through direct contribution and providing direction in establishing development and operational practices, with specific attention to AI/ML model lifecycle management.
  • Serving as the technical lead on our most demanding, cross-functional projects.
  • Actively mentoring individuals and the engineering community on advanced technical issues, including best practices in AI/ML.

What We’ll Expect From You:

Architect-level experience in the following domains:

    • Proven expertise in large-scale cloud and AI/ML services, and a deep understanding of cloud computing’s potential in enhancing AI/ML applications.
    • Demonstrated ability to lead and mentor large software and AI/ML teams.
    • Experience with web and cloud-native services is a must-have, with experience deploying scalable AI/ML solutions in production.
    • Adept at Systems Thinking with an ability to decompose complex problems into simple, straight-forward solutions, including AI/ML-specific challenges like model drift and data dependency management.
    • Strong grasp of system interdependencies, limitations, and expertise in AI/ML optimization techniques for performance, scalability, and accuracy.

AI/ML Expertise:

    • Hands-on experience in AI/ML frameworks and libraries, such as TensorFlow, PyTorch, or Scikit-Learn, and model-serving frameworks such as TensorFlow Serving or ONNX.
    • Proven experience in developing and deploying models for performance-intensive applications at web-scale.
    • Understanding of the MLOps lifecycle, including data engineering, model training, validation, deployment, and monitoring.
    • Understanding of key HPC technologies including RDMA, InfiniBand/RoCE, GPUDirect and other storage technologies 
  • Knowledge in performance, scalability, enterprise system architecture, and engineering best practices with an emphasis on the integration of AI/ML.
  • Leverage knowledge of open-source, industry standards, and prior art in architecture decisions with AI/ML considerations.
  • Balance technical leadership and savvy with strong business judgment to make the right decisions about technology, demonstrating simplicity and creativity.
  • Master’s degree or higher preferred in Computer Science, AI/ML, or a related field.
  • 15+ years professional experience in web-scale system software development.
  • 5+ years experience demonstrating an established track record in Deep Learning and Machine Learning
  • 3+ years recent experience as an ML engineer, data science engineer, or similar
  • In-depth experience in two or more of the following areas: Cloud Computing, Storage, Networking, Platform-as-a-Service, Infrastructure-as-a-Service, Software-as-a-Service.
  • Excellent communication skills at all levels

Why You’ll Like Working for DigitalOcean:

  • We are proud to work here. You’ll be a part of a cutting-edge technology company with an upward trajectory, who are proud to simplify cloud computing so builders can spend more time creating software that changes the world. As a member of the team, you will be a Shark who thinks big, bold, and scrappy, like an owner with a bias for action and a powerful sense of responsibility for customers, products, employees, and decisions. 
  • We prioritize career development. At DO, you’ll do the best work of your career. You will work with some of the smartest and most interesting people in the industry. We are a high-performance organization that will always challenge you to think big. Our organizational development team will provide you with resources to ensure you keep growing. We provide employees with reimbursement for relevant conferences, training, and education. All employees have access to LinkedIn Learning's 10,000+ courses to support their continued growth and development.
  • We care about your well-being. Regardless of your location, we will provide you with a competitive array of benefits to support your overall well-being, from one-time work from home stipend to wellness allowance to flexible time off policy, to name a few. While the philosophy around our benefits is the same worldwide, specific benefits may vary based on local regulations and preferences.
  • We reward our employees. The salary range for this position is between $225,000.00 - $338,000.00 based on market data, relevant years of experience, and skills. You may qualify for a bonus in addition to base salary; bonus amounts are determined based on company and individual performance. We also provide equity compensation to eligible employees, including equity grants upon hire and the option to participate in our Employee Stock Purchase Program. 
  • We value diversity and inclusion. We are an equal-opportunity employer, and recognize that diversity of thought and background builds stronger teams and products to serve our customers. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

*This is a remote role

#LI-Remote

#LI-KR1

Apply

Job Profile

Regions

North America

Countries

United States

Restrictions

Remote-first organization

Benefits/Perks

Access to LinkedIn Learning Career development Cutting-edge technology Employee Stock Purchase Program Equity Compensation Flexible time off High-performance organization Mentorship opportunities Opportunity to work with cutting-edge technology Remote-first organization Remote first work environment Upward trajectory Wellness allowance Work from home stipend

Tasks
  • Architecting cloud services
  • Developing AI/ML models
  • Establishing technical direction
  • Leading technical teams
  • Mentoring engineers
Skills

Accuracy AI AI/ML Benefits Cloud Computing Cloud Infrastructure Cloud-native Services Cloud Services Communication Compensation Data dependency management Data engineering Design Distributed computing Engineering Best Practices Enterprise Architecture Enterprise system architecture Gpudirect HPC technologies Infiniband Leadership Lifecycle Management Machine Learning Mentoring ML MLOps Model Deployment Model drift Model serving Model training Networking ONNX Organization Organizational Performance Performance Optimization PyTorch RDMA RoCE Scalability Scikit-learn Software Development Storage System architecture Systems Thinking Technical Leadership TensorFlow Tensorflow serving Training User Experience

Experience

5 years

Education

AI Computer Science Engineering ML

Timezones

America/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9