Solutions Architect, Data Center Infrastructure
US, TX, Remote, United States
NVIDIA is seeking a Solutions Architect in Data Center Infrastructure to join our Infrastructure Specialists team. Academic and commercial groups worldwide are using NVIDIA products to redefine deep learning, data analytics, and power data centers. Join the team building many of the world's largest and fastest data centers and supercomputers! NVIDIA is looking for someone who can lead planning and deployments of AI data centers including power/cooling systems, cabling and network provisioning and bring-up/validation.
As the NVIS Solutions Architect for Datacenter Infrastructure, you will focus on data center audit, planning and deployment ensuring the integrity of NVIDIA platform infrastructure. Your primary goal will be to guarantee that all aspects of the data center's physical infrastructure are meticulously planned, implemented, and validated to meet NVIDIA reference architectures, operational requirements, and industry standards. This infrastructure includes architectural systems, power distribution, liquid/air cooling systems, compute, network and cabling (fiber and copper), and telemetry systems.
What you will be doing:
NVIS Datacenter Engineering and planning: Collaborate with other teams to plan and implement data center infrastructure solutions based on NVIDIA Datacenter reference architecture, including power distribution, cooling systems, network architecture, server hardware, and storage systems.
Plan and manage deployment of NVIDIA's pioneering AI infrastructure solutions including highly complex rack-scale, liquid cooled compute and networking hardware systems, in a fluid and fast paced environment.
Conduct pre-deployment planning including reviewing cluster and data center architecture, plan network port mapping and fiber optic cabling BOM, identify potential risks, train vendors and find areas for improvement.
Evaluate customers' and partners' infrastructure design proposals for consistency with industry standards and regulatory requirements. Provide feedback and recommendations to improve performance, scalability, and cost-effectiveness.
Perform testing, troubleshooting and validation of compute systems based on collaboration with product and engineering teams.
Act as the NVIS mentor providing guidance, mentorship, and support to ensure the NVIS team's success in their respective roles.
Quality Assurance: Establish and enforce quality assurance processes to verify that deployments meet established specifications and performance benchmarks. Conduct thorough bring-up, testing, and validation to validate the functionality and reliability of infrastructure components.
Continuous Improvement: Drive continuous improvement initiatives to enhance data center infrastructure efficiency for NVIDIA data center reference architecture and deployment blueprint, resilience, and sustainability. Find opportunities to streamline processes, automate repetitive tasks, and leverage emerging technologies to optimize infrastructure operations.
Collaboration and Communication: Collaborate and communicate across internal teams, external vendors, and customers to facilitate the seamless integration of data center infrastructure solutions. Serve as a domain expert and point of contact for infrastructure-related inquiries and blocking issues.
What we need to see:
Bachelor's degree (or equivalent experience) in Engineering, Computer Science, Information Technology, or a related field.
3+ years of overall experience in enterprise and/or hyperscale data centers with continual infrastructure deployment experience, preferably for high density AI/HPC data centers.
Working experience in data center operations, or infrastructure management roles, focusing on large-scale data center deployments.
Strong technical knowledge and experience in the data center stack - power distribution, liquid cooling, servers, networking, storage and pre-deployment planning
Relevant certification – preferred
Demonstrated technical and project leadership under fluid situations, ability to adapt to unknowns and change.
Excellent analytical, problem-solving, and decision-making skills, keen attention to detail, and a commitment to quality.
Excellent communication and interpersonal abilities, capable of engaging with various collaborators like customers to enable productive discussions.
Organization & Time Management – able to plan, schedule, and organize tasks related to the job to achieve goals within or ahead of established time frames.
Willingness to travel (40%).
Way to stand out from the crowd:
Linux system administration skills
Strong knowledge of whole data center Infrastructure stack
Flexible/agile and enjoys solving challenging problems
NVIDIA is widely considered one of the world's most desirable employers in technology. We have some of the world's most forward-thinking and passionate people working for us. If you're creative and autonomous, we want to hear from you!
The base salary range is 120,000 USD - 235,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. ApplyJob Profile
Benefits Diversity Eligible for Equity Equity Equity and benefits Work environment
Tasks- Collaborate
- Collaborate with teams and vendors
- Conduct pre-deployment planning
- Drive continuous improvement initiatives
- Establish quality assurance processes
- Evaluate infrastructure design proposals
- Manage deployment of AI infrastructure solutions
- Mentor
- Perform testing and validation
- Plan and implement data center infrastructure
- Provide feedback
- Quality assurance
Agile AI Ai infrastructure Analytical Analytics Collaboration Communication Compute Continuous Improvement Cooling Systems Data center Data Center Infrastructure Data centers Deep Learning Deployment Engineering HPC Infrastructure Infrastructure Management Integration Interpersonal Leadership Linux Linux System Administration Management Mentorship Network architecture Networking NVIDIA Operations Organization Planning Power distribution Problem-solving Project leadership Quality Assurance Recommendations Scalability Server Hardware Servers Solutions Architecture Storage Storage systems Support System Administration Team building Telemetry Testing Time Management Troubleshooting Validation
Experience3 years
EducationBachelor Bachelor's Bachelor's degree Computer Science Deep Learning Engineering Equivalent Equivalent experience Information Technology Operations Related Field
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9