Linux Support Engineer III

Remote (US)

Lambda

Published 2 months ago

Hey, this job isn't fresh anymore! 👉 Find fresh remote jobs here

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and MIT. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do

Engage directly with customers to deeply understand their challenges, ensuring a personalized, and effective support experience.
Dive into complex software and hardware issues, providing timely and efficient solutions.
Craft comprehensive documentation of solutions and contribute to enhancing support procedures, ensuring continuous improvement in service quality.
Identify common customer pain points and collaborate closely with engineering teams to develop innovative solutions, constantly improving the overall customer experience.
Collaborate in the development of new products, contributing your expertise to shape the future of deep learning cloud infrastructure.
Take escalations from your peers while looking for opportunities to train and educate them in the process.
Participate in a rotating on-call schedule where you’ll be responsible for major incidents and customer issues.

You

Proven experience in clustered or HPC environments, showcasing your mastery in Linux administration
Experience in administering the entire infrastructure stack, from hardware, to firmware and drivers, to OS and software support.
Proficiency in Bash or Python scripting, enabling you to automate tasks and streamline operations
Excellent written and oral communication skills, ensuring effective interaction with both technical and non-technical stakeholders
Hands-on experience in private or hybrid cloud environments, such as Azure, GCP, AWS, OCI, or OpenStack

Nice to Have

Professional experience administering Kubernetes in cluster environments.
Familiarity with infrastructure-as-code tools (Terraform, Puppet, Ansible, Chef, etc.)
Experience with high throughput networking technologies, RDMA, NCCL, GPUDirect, SLURM, or distributed GPU training systems
Experience with ML/AI/deep learning and a solid understanding of the hardware, software, and tools used in this domain
Experience with NVIDIA data center GPUs and distributed or parallel file systems

About Lambda

We offer generous cash & equity compensation
Investors include Gradient Ventures, Google’s AI-focused venture fund
We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
We have a wildly talented team of 150, and growing fast
Health, dental, and vision coverage for you and your dependents
Commuter/Work from home stipends
401k Plan
Flexible Paid Time Off Plan that we all actually use

Salary Range …

This job isn't fresh anymore!

Search Fresh Jobs

Job Profile

Regions

North America

Countries

United States

Benefits/Perks

401(k) Plan Cash & equity compensation Commuter/Work from home stipends Flexible paid time off Flexible Paid Time Off Plan Health, dental, and vision coverage

Tasks

Collaborate with engineering teams
Engage with customers
Participate in on-call schedule

Skills

Ansible Bash scripting Deep Learning Kubernetes Linux administration OpenStack Python Python scripting Terraform