Cluster Administrator Engineer
Virtual - USA AZ
Job Description:
Intel Labs Emergent AI team is looking for an experienced cluster administrator to manage HPC clusters. The right candidate will have experience on SLURM and related technologies and will be familiar with workloads related to machine learning training and inference (GPU and CPU).
In this role, your responsibilities will include:
- Serve as the primary contact for a GPU+CPU cluster
- Collected team feedback and relayed to the support team (schedule downtimes/maintenance, propose changes to the cluster, etc.)
- Perform capacity planning to help determine compute/storage needs for the team moving forward
- Serve as the owner of the SLURM job scheduler, defining the configuration that better fits the team and developing/enabling advanced features
- Serve as the team datasets owner (manage the datasets that live in the cluster and how people access them)
- Help the team optimize/troubleshoot complex jobs/pipelines (AI centric, simulation, 3D graphics, etc.).
- Educate the team on how to use the cluster (SLURM, BeeGFS, datasets, etc.), enabling a fast ramp up time of new scientists and engineers (via tutorials, presentations, wiki docs, etc.)
- Effectively communicate with a variety of shareholders, including presenting plans to higher management and having technical discussions with engineers/scientists.
Qualifications:
You must possess the below minimum education requirements and minimum required qualifications to be initially considered for this position. Relevant experience can be obtained through schoolwork, classes, project work, internships, and/or military experience. Additional preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates.
Minimum Qualifications:
- Bachelor’s/BS with 4+ years, or Master’s Degree with 3+ years in Electrical Engineering (EE), Computer Engineering (CE), Computer Science (CS), or a related field, with experience in designing and managing large clusters with heterogeneous hardware (CPUs, GPUs, etc.) is essential.
- 10 years or more of Demonstrated ability to understand and anticipate the needs of scientists and engineers through data analysis and develop a cluster growth plan to meet these needs effectively.
- 10 years or more of Proven experience as a power user, willing to extensively test and optimize various workflows running in the cluster.
- 10 years or more of Deep knowledge of cluster orchestration and management technologies, including SLURM, BeeGFS, Docker, and similar tools.
This position is not eligible for Intel immigration sponsorship.
Job Type:
Intel Contract EmployeeShift:
Shift 1 (United States of America)Primary Location:
Virtual USAdditional Locations:
Business group:
Enable amazing computing experiences with Intel Software continues to shape the way people think about computing – across CPU, GPU, and FPGA architectures. Get your hands on new technology and collaborate with some of the smartest people in the business. Our developers and software engineers work in all software layers, across multiple operating systems and platforms to enable cutting-edge solutions. Ready to solve some of the most complex software challenges? Explore an impactful and innovative career in Software.Posting Statement:
All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.Position of Trust
N/ABenefits:
We offer a total compensation package that ranks among the best in the industry. It consists of competitive pay, stock, bonuses, as well as, benefit programs which include health, retirement, and vacation. Find more information about all of our Amazing Benefits here: https://www.intel.com/content/www/us/en/jobs/benefits.html
Annual Salary Range for jobs which could be performed in the US:
$52,000.00-$200,000.00Salary range dependent on a number of factors including location and experience.
Work Model for this Role
This role is available as a fully home-based and generally would require you to attend Intel sites only occasionally based on business need. This role may also be available as our hybrid work model which allows employees to split their time between working on-site at their assigned Intel site and off-site. In certain circumstances the work model may change to accommodate business needs.The application window for this job posting is expected to end by 10/18/2024 ApplyJob Profile
Not eligible for immigration sponsorship Not eligible for Intel Immigration Not eligible for Intel Immigration sponsorship Virtual - USA AZ
Benefits/PerksAccess to new technology Bonuses Collaborative work environment Competitive pay Retirement Stock Total compensation package Vacation
Tasks- Data Analysis
- Educate team on cluster usage
- Manage HPC clusters
- Management
- Optimize/troubleshoot jobs
- Perform capacity planning
- Schedule downtimes
AI Beegfs Cluster Management Computer Science Cpu Data analysis Docker Electrical Engineering GPU Graphics Machine Learning Operating Systems Orchestration Planning Presentations SLURM Tutorials Wiki documentation
Experience10 years
EducationBachelor's Business Computer Engineering Computer Science Electrical Engineering Engineering Master Master's Related Field
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9