Software Platform Support Engineer - GPU Cloud
US, CA, Remote, United States
The NVIDIA DGX Cloud organization is looking for passionate software support engineers to partner closely with our internal customers to support them on our internal platforms. This partnership requires you to gain a deep understanding of the customer needs, how their application(s) work, assist them in troubleshooting issues, and create documentation to make it easier for users to troubleshoot issues themselves. The support you provide will help our users have a better experience and help shape our platform.
We expect you to have knowledge of supporting cloud-based deployments across compute, storage and networking environments.
What will you be doing:
Partner with multiple internal teams to provide Tier 1 support for complex cloud platforms
Triage/investigate root cause of customer issues and escalate as needed
File bugs and report issues while working closely with the Site Reliability team
Build tooling to improve customer support process and visibility
Document best practices, solutions, knowledge base articles, how to’s, and blog posts
Deeply understand user workloads and use cases
Partner with multiple internal teams to give feedback to engineering teams and develop solutions to aid in their success
Be part of an on call rotation to support production systems
What we need to see:
BS/MS degree in Computer science or related areas (or equivalent experience)
2+ yrs of experience with supporting distributed software systems
2+ yrs of experience supporting end user software platforms
2+ yrs of experience with Linux
Experience with Kubernetes as well as experience with AWS, Azure, OCI, and GCP
Background of Infrastructure, Networking, Storage, and DevOps scripting/tooling
Understanding of data storage technologies (databases, file, block, blob)
Willingness to become an expert in DGX Cloud
Customer Service/Support Experience
Willingness to work up and down the stack as well as across multiple teams
Strong skills in troubleshooting with outstanding communication skills
Ways to stand out from the crowd:
SLURM or HPC previous experience
Machine Learning and/or AI experience (self-taught is great!)
A strong drive to work with internal customers and make them successful
A drive to improve process with strong organizational skills
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Benefits Diversity Eligible for Equity Equity Equity and benefits Work environment
Tasks- Build tooling
- Document best practices
- Partner with internal teams
- Support production systems
- Triage customer issues
- Troubleshoot issues
AI Artificial Intelligence AWS Azure Block storage Cloud Cloud platforms Communication Compute Customer service Customer Support Databases Data storage DevOps Documentation Engineering File storage GCP GPU HPC Infrastructure Kubernetes Linux Machine Learning Networking NVIDIA OCI Organization Organizational Process Improvement Scripting SLURM Software support Storage Storage technologies Support Troubleshooting Visualization
Experience2 years
EducationArtificial Intelligence B.S. Computer Science Engineering Equivalent Equivalent experience Machine Learning MS Related areas
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9