Senior Data Scientist, Machine Learning
Los Angeles, CA
At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses. The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles while doing commercial deliveries.
The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles while doing commercial deliveries. We’re looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity.
Who We Are
We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving real-world problems leveraging robotics, machine learning and computer vision, among other disciplines, with a mindful eye towards the end-to-end user experience. Our team is agile, diverse, and driven. We believe that the best way to solve complicated dynamic problems is collaboratively and respectfully.
What you'll be doing
Serve Robotics aims to develop dependable and proficient sidewalk autonomy software. We are looking for a talented Senior Data Scientist who bridges the gap between ML infrastructure and ML engineers. The ideal candidate possesses strong fundamentals in machine learning, with the ability to prototype and train learning-based models using data-centric techniques. This individual should also have expertise in data ETL processes, SQL queries, and building scalable data pipelines to make data accessible for model training.
Responsibilities
Prototype and train learning-based models using a data-centric approach, applying techniques such as automated feature engineering, active learning, and fine-tuning on curated datasets.
Design, develop, and maintain efficient data and feature extraction pipelines to support ML engineers in accessing high-quality data for model training.
Design auto labeling system using ensemble of models that can reason from multi-modal data for different use-cases, For example: image semantic labeling using vision grounded models, intent and path prediction ground truth.
Perform complex data extraction, transformation, and loading (ETL) processes, ensuring data is clean, accessible, and well-documented. Write and optimize high-quality SQL queries for data analysis and ingestion from various sources.
Partner with data infrastructure and ML engineers to ensure seamless integration of data and machine learning workflows.
Produce high-quality, maintainable code and participate in peer code reviews to share knowledge and uphold team standards.
Qualifications
Master’s in Computer Science, Data Science, or a related technical field and 5+ years of industry experience in data engineering, machine learning, or a similar domain.
Strong proficiency in Python and SQL, with demonstrated experience building data pipelines at scale and ETL workflows that cater to multi-modal data (e.g., images, point clouds, time-series data).
Proven ability to work with PB’s of datasets, including structured, semi-structured, and unstructured data.
Hands-on experience working with ML frameworks such as TensorFlow, PyTorch, or similar.
Solid understanding of ML fundamentals and data-centric techniques for model training.
Experience with cloud platforms (GCP, AWS, or Azure) and tools like Kubernetes, Docker, and Airflow.
Excellent communication skills and the ability to collaborate with cross-functional teams.
What makes you standout
Experience optimizing ML workflows using MLOps tools such as MLflow, TFX, Kubeflow, or similar platforms.
Strong understanding of transformer-based models and their application in data-centric AI workflows.
Knowledgable in advanced SQL query optimization and ETL pipeline performance tuning.
Familiarity with tools for scalable data engineering, such as Apache Beam, Dask, or BigQuery.
Job Profile
- Collaborate with engineers
- Design and maintain data pipelines
- Perform ETL processes
- Prototype and train models
- Write and optimize SQL queries
Airflow Analysis Apache Beam Autonomy AWS Azure BigQuery CAN Communication Computer Vision Data analysis Data-centric techniques Data engineering Data Pipelines Docker ETL GCP Kubeflow Kubernetes Machine Learning MLFlow MLOps Python PyTorch Robotics SQL TensorFlow Tfx
Experience5 years
EducationComputer Science Engineering Master’s in computer science Master’s in data science Related technical field
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9