Systems Engineer (Automation)
Roseland, NJ / Brooklyn, NY / Remote
CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry’s fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80% less expensive than the large, generalized public clouds. Learn more at www.coreweave.com.
About the Role:
CoreWeave is seeking a highly skilled and motivated Systems Automation Engineer to join our Kernel HAVOCK Team, reporting to the Manager of Systems Engineering. In this role, you will play a crucial part in the design, development, and optimization of our bare-metal systems from POST through joining a Kubernetes cluster. The team’s primary responsibilities include maintaining a custom Linux kernel, various OS images (Ubuntu-based), the virtualization stack (kubevirt/qemu/vfio), and the container/pod runtime stack (containerd/nydus/kubelet). You will collaborate closely with cross-functional teams, upstack engineering teams, and stakeholders to successfully deliver highly performant and reliable software solutions.
Kernel Hardware - Acceleration - Virtualization - Operating Systems - Containerization - Kubelet
Our Team’s Stack:
- Linux Kernel (custom build)
- Intel/AMD CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs
- ARM CPUs
- KubeVirt, QEMU, SR-IOV, vfio-pci
- Ubuntu
- Containerd, Kubelet
Responsibilities:
- Develop and maintain tooling to support both stateless and stateful systems
- Automate packaging of critical components (e.g. drivers, microcode, components with out-of-tree patches, etc)
- Build long-lived, well documented tooling for automating various tasks related to the full lifecycle of systems (e.g. hardware/software inventory management, hardware diagnostics, performance benchmarking, etc)
- Build and serve as point of contact for CI/CD pipelines for reproducible software builds, OS images, and kernel/OS-level testing
- Collaborate with cross-functional teams to define Linux and OS requirements, specifications, and system architecture
Requirements:
- 5+ years of professional experience maintaining clusters of Linux servers
- Fluency with a programming language geared toward automation (Python preferred, but others possible)
- Experience adhering to structured release cycles
- Experience building CI/CD pipelines (GitHub or GitLab)
- Experiencing with implementing automation testing
- Ability to effectively prioritize and communicate proposed features and fixes
- Strong passion for automation, with a commitment to automating processes comprehensively
- Excellent documentation skills and attention to detail
- Strong analytical and problem-solving abilities
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Hybrid workplace Quarterly travel requirements Subsequent quarterly travel
Benefits/PerksGrowth Opportunities Hybrid work Hybrid workplace Onboarding training Quarterly travel Team Collaboration
Tasks- Documentation
- Reporting
- Support
AI Analytical Automation Batch processing CI/CD Cloud solutions Containerization Documentation Engineering GitHub GitLab Golang Inventory Management Kubernetes Linux Machine Learning Management Organization Problem-solving Python Rendering Systems Engineering Testing VFX Virtualization
Experience5 years
Education TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9