FreshRemote.Work

Linux Support Engineer III

Remote (US)

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and MIT. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us. 

What You’ll Do 

  • Engage directly with customers to deeply understand their challenges, ensuring a personalized, and effective support experience.
  • Dive into complex software and hardware issues, providing timely and efficient solutions.
  • Craft comprehensive documentation of solutions and contribute to enhancing support procedures, ensuring continuous improvement in service quality.
  • Identify common customer pain points and collaborate closely with engineering teams to develop innovative solutions, constantly improving the overall customer experience.
  • Collaborate in the development of new products, contributing your expertise to shape the future of deep learning cloud infrastructure.
  • Take escalations from your peers while looking for opportunities to train and educate them in the process.
  • Participate in a rotating on-call schedule where you’ll be responsible for major incidents and customer issues.

You

  • Proven experience in clustered or HPC environments, showcasing your mastery in Linux administration
  • Experience in administering the entire infrastructure stack, from hardware, to firmware and drivers, to OS and software support.
  • Proficiency in Bash or Python scripting, enabling you to automate tasks and streamline operations
  • Excellent written and oral communication skills, ensuring effective interaction with both technical and non-technical stakeholders
  • Hands-on experience in private or hybrid cloud environments, such as Azure, GCP, AWS, OCI, or OpenStack

Nice to Have 

  • Professional experience administering Kubernetes in cluster environments.
  • Familiarity with infrastructure-as-code tools (Terraform, Puppet, Ansible, Chef, etc.)
  • Experience with high throughput networking technologies, RDMA, NCCL, GPUDirect, SLURM, or distributed GPU training systems
  • Experience with ML/AI/deep learning and a solid understanding of the hardware, software, and tools used in this domain
  • Experience with NVIDIA data center GPUs and distributed or parallel file systems

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 150, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends
  • 401k Plan
  • Flexible Paid Time Off Plan that we all actually use

Salary Range …

This job isn't fresh anymore!
Search Fresh Jobs