Senior Software Engineer, Vision Language Models

Remote

Motional

USD 175K+ Full Time Senior Mid

Company preview All jobs at Motional

Search Fresh Jobs Job profile

Published 2 months ago

Hey, this job isn't fresh anymore! 👉 Find fresh remote jobs here

Mission Summary

At Motional, data play a critical role in fueling our ML-centered autonomous driving vehicle. Our robo-taxi fleet collects petabytes of data on the road every day – the Data Mining team is mining & filtering the massive influx of fleet data by developing billion-scale data workflows and state-of-the-art mining algorithms. Through our mining and learning frameworks we continuously improve the on-road performance of ML products for perception, prediction & planning with every mile driven.

We mine for model errors, anomalies, rare objects & long-tail driving scenarios across millions of driving hours – these are used for laser-focused ML model training and continuous edge case validation. We are looking for an engineer to spearhead new mining strategies & workflows and help deliver high-quality data that improve our core ML products.

What you'll be doing:

Develop data products utilizing foundation models such as multi-modal encoding models (CLIP-style models), vision language models (VLMs) and large language models (LLMs) and adapt them to the autonomous driving domain via pre-training, fine-tuning and prompt optimization.
Own large-scale mining workflows that surface rare objects, model errors & long-tail events.
Build high-quality datasets to improve ML products through training & edge case validation.
Contribute to data processing pipelines that fuel our in-house billion-scale image search engine.
Provide statistical depth on model performance & generalization through rigorous error analysis across complex driving scenarios.

What we’re looking for:

BS in computer science, similar discipline or equivalent experience.
3+ years of experience architecting and shipping high-performance & large-scale distributed systems.
Experience with deploying vision language models (VLMs) or large-scale vision encoders (e.g. CLIP) in production settings for image/video understanding, object detection or searching.
Experience with core cloud services (e.g. AWS’s S3, Athena, RDS or similar) and modern vector databases (OpenSearch, Weaviate, Pinecone etc.).
Solid software engineering principles – such as software design patterns, configuration management, source control, build processes, code reviews, testing methodologies, app containerization, continuous integration etc.
Fluency in Python and experience on production-quality software development.

Bonus points (not required):

MS/PhD in computer science, machine learning, statistics or computer vision.
Experience with at least one of the following ML techniques/models: Few-shot Learning, Metric Learning, Information Retrieval, Recommender Systems, Contrastive Learning, Semi-supervised Learning, Object Detection / Segmentation / Prediction.
Experience with PyTorch or other deep …