Principal Engineer, ML Training Platform
Remote U.S.
Mission Summary
Motional is seeking a proactive and collaborative Principal Engineer with a strong background in infrastructure and ML Ops to lead our high-visibility ML Training Platform team. In this critical role, you will be the accountable owner of our ML training platformâdesigning, developing, and maintaining the robust systems that underpin our ML workflows to ensure optimal performance, scalability, and reliability across production environments. You will drive key engineering objectives, collaborate closely with infrastructure engineering, the ML team, and ML evaluation/metrics systems to accelerate training and evaluation processes, and deliver models that enhance on-road autonomous driving performance.
To be successful in this role, you understand business priorities, autonomy critical initiatives, thrive in ambiguous environments, and ultimately deliver models that improve on road AV performance.
What You'll Be Doing
- Design, build and maintain scalable ML data processing, model training solutions in the AWS cloud infrastructure environment utilizing Kubernetes
- Perform training and model performance optimization with various GPUs to improve model training speed and efficiency.
- Leverage Pytorch and Ray deep learning frameworks to operate highly-available systems at scale
- Drive the execution of technical programs and ensure milestone delivery
- Actively manage and mitigate technical risks
What We're Looking For
- 6+ years of Python software development experience.Â
- Hands-on experience with popular ML frameworks (PyTorch or TensorFlow)Â
- Hands-on experience with scaling ML systemsÂ
- Practical experience with large scale AWS cloud infrastructure utilizing Kubernetes.
- Strong problem solving skills and ability to evaluate challenges with an objective, data-driven approach
- Excellent programming and software design skills, including debugging, performance analysis, and test design
- Proven track record of operating highly-available systems at scale
- Strong collaboration and mentorship skills
Bonus Points
- Hands-on experience with Ray in large scale environments.
- Experience with ML data processing for large-scale deep learning training.
- Experience refactoring ML code written by ML engineers.
The salary range for this role is an estimate based on a wide range of compensation factors including but not limited to specific skills, experience and expertise, role location, certifications, licenses, and business needs. The estimated compensation range listed in this job posting reflects base salary only. This role may include additional forms of compensation such as a bonus or company equity. The recruiter assigned to this role can share more information about the specific compensation âŚ
This job isn't fresh anymore!
Search Fresh JobsJob Profile
401(k) Benefits program Bonus Company equity Dental Equity Health saving accounts Health Savings Accounts Life Insurance Medical Pet Insurance Vision
Tasks- Collaborate with teams
Autonomous driving Autonomous Vehicles AWS Cloud Infrastructure Data processing Debugging Deep Learning Kubernetes ML ML frameworks Model training Performance analysis Performance Optimization Programming Python PyTorch Ray Scalability Software design Software Development TensorFlow Test Design
Experience6 years
Education TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9