FreshRemote.Work

Linux Support Engineer III - Remote (US)

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and MIT. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us. 

What You’ll Do 

  • Engage directly with customers to deeply understand their challenges, ensuring a personalized, and effective support experience.
  • Dive into complex software and hardware issues, providing timely and efficient solutions.
  • Craft comprehensive documentation of solutions and contribute to enhancing support procedures, ensuring continuous improvement in service quality.
  • Identify common customer pain points and collaborate closely with engineering teams to develop innovative solutions, constantly improving the overall customer experience.
  • Collaborate in the development of new products, contributing your expertise to shape the future of deep learning cloud infrastructure.
  • Take escalations from your peers while looking for opportunities to train and educate them in the process.
  • Participate in a rotating on-call schedule where you’ll be responsible for major incidents and customer issues.

You

  • Proven experience in clustered or HPC environments, showcasing your mastery in Linux administration
  • Experience in administering the entire infrastructure stack, from hardware, to firmware and drivers, to OS and software support.
  • Proficiency in Bash or Python scripting, enabling you to automate tasks and streamline operations
  • Excellent written and oral communication skills, ensuring effective interaction with both technical and non-technical stakeholders
  • Hands-on experience in private or hybrid cloud environments, such as Azure, GCP, AWS, OCI, or OpenStack

Nice to Have 

  • Professional experience administering Kubernetes in cluster environments.
  • Familiarity with infrastructure-as-code tools (Terraform, Puppet, Ansible, Chef, etc.)
  • Experience with high throughput networking technologies, RDMA, NCCL, GPUDirect, SLURM, or distributed GPU training systems
  • Experience with ML/AI/deep learning and a solid understanding of the hardware, software, and tools used in this domain
  • Experience with NVIDIA data center GPUs and distributed or parallel file systems

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 150, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends
  • 401k Plan
  • Flexible Paid Time Off Plan that we all actually use

Salary Range Information 

Based on market data and other factors, the salary range for this position is $85,000-$140,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description. 

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Apply

Job Profile

Countries

United States

Benefits/Perks

401(k) Plan Cash & equity compensation Commuter/Work from home stipends Flexible paid time off Flexible Paid Time Off Plan Health, dental, and vision coverage

Skills

Ansible Bash scripting Deep Learning High throughput networking technologies Infrastructure-as-code tools Kubernetes Kubernetes administration Linux administration ML/AI/deep learning NVIDIA data center GPUs OpenStack Python Python scripting Terraform

Tasks
  • Collaborate with engineering teams
  • Contribute to new product development
  • Document solutions
  • Engage with customers
  • Participate in on-call schedule
  • Provide training and education
  • Resolve software and hardware issues