Senior Systems Engineer, Compute Platform - Autonomous Vehicles
Canada, Remote
We are looking for Senior Engineers to work on scaling our cloud compute platform for Autonomous Vehicles (AV). Our platform provides access to 100s of PBs of data and exa-scale GPU+CPU compute for various AV workloads including data ingestion, processing and model training. We are embarking on building the next generation of the platform and looking for strong engineers to join us in this journey.
What you'll be doing:
Enhance and scale our compute platform to support diverse workloads on GPUs and CPUs
Design and build scalable and distributed services to power large scale workloads
Design and build scalable tools to efficiently operate services and hardware clusters
Collaborate with multiple teams to understand their needs, and build functionality that improves their user experience and productivity
Participate in operations, oncall and user support
What we need to see:
BS/MS/Phd in Computer Science, Engineering or other technical fields or equivalent experience
6+ years of experience developing and operating backend systems at scale
Proficiency in Golang and distributed systems
Deep care for user experience
Strong collaboration and communication skills
You are extremely motivated, highly passionate, curious about and follow state-of-the-art technologies
Strong willingness to learn, listen to diverse opinions, and contribute to an inclusive and growth-oriented culture
Ways to stand out from the crowd:
Prior background in building AI Infrastructure for Autonomous Vehicles
Familiarity with HPC and workload managers (e.g. SLURM)
Experience with Workflow orchestration systems (e.g Flyte, Kubeflow pipelines, Airflow)
Experience managing and deploying services on the cloud (e.g. AWS, GCP)
Open source contributions
You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
#deeplearning ApplyJob Profile
Benefits Eligible for Equity Equity Equity and benefits
Tasks- Collaborate with teams
- Design scalable services
- Enhance compute platform
- Participate in operations
- Training
AI Ai infrastructure Autonomous Vehicles AWS Cloud Services Collaboration Communication Compute Distributed Systems Engineering GCP Golang GPU GPUs HPC Kubeflow NVIDIA Open Source Operations Orchestration Orchestration systems Scaling SLURM Training Workflow Orchestration Workload Managers
Experience6 years
EducationB.S. Computer Science Engineering Equivalent Equivalent experience MS Operations Ph.D. Technical Fields
TimezonesAmerica/Edmonton America/Moncton America/Regina America/St_Johns America/Toronto America/Vancouver UTC-3 UTC-4 UTC-5 UTC-6 UTC-7 UTC-8