FreshRemote.Work

Principal Engineer, ML Training Platform

Remote U.S.

Mission Summary

Motional is seeking a proactive and collaborative Principal Engineer with a strong background in infrastructure and ML Ops to lead our high-visibility ML Training Platform team. In this critical role, you will be the accountable owner of our ML training platform—designing, developing, and maintaining the robust systems that underpin our ML workflows to ensure optimal performance, scalability, and reliability across production environments. You will drive key engineering objectives, collaborate closely with infrastructure engineering, the ML team, and ML evaluation/metrics systems to accelerate training and evaluation processes, and deliver models that enhance on-road autonomous driving performance.

To be successful in this role, you understand business priorities, autonomy critical initiatives, thrive in ambiguous environments, and ultimately deliver models that improve on road AV performance.

What You'll Be Doing

  • Design, build and maintain scalable ML data processing, model training solutions in the AWS cloud infrastructure environment utilizing Kubernetes
  • Perform training and model performance optimization with various GPUs to improve model training speed and efficiency.
  • Leverage Pytorch and Ray deep learning frameworks to operate highly-available systems at scale
  • Drive the execution of technical programs and ensure milestone delivery
  • Actively manage and mitigate technical risks

What We're Looking For

  • 6+ years of Python software development experience. 
  • Hands-on experience with popular ML frameworks (PyTorch or TensorFlow) 
  • Hands-on experience with scaling ML systems 
  • Practical experience with large scale AWS cloud infrastructure utilizing Kubernetes.
  • Strong problem solving skills and ability to evaluate challenges with an objective, data-driven approach
  • Excellent programming and software design skills, including debugging, performance analysis, and test design
  • Proven track record of operating highly-available systems at scale
  • Strong collaboration and mentorship skills

Bonus Points

  • Hands-on experience with Ray in large scale environments.
  • Experience with ML data processing for large-scale deep learning training.
  • Experience refactoring ML code written by ML engineers.

The salary range for this role is an estimate based on a wide range of compensation factors including but not limited to specific skills, experience and expertise, role location, certifications, licenses, and business needs. The estimated compensation range listed in this job posting reflects base salary only. This role may include additional forms of compensation such as a bonus or company equity. The recruiter assigned to this role can share more information about the specific compensation …

This job isn't fresh anymore!
Search Fresh Jobs