Site Reliability Engineer - Networking Support - US, AZ, Remote

NVIDIA is looking for a Site Reliability Engineer (SRE) to join its Networking Support team. As an SRE at NVIDIA you will ensure that our customers production environments have reliability and uptime. We are seeking an SRE with a mentality and methodology of how maintain, monitor and troubleshoot DC networking equipment.

SRE's culture of diversity, intellectual curiosity, problem solving and openness is important to our success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow.

What you will be doing:

  • Supervise equipment, applications and processes through various tools applications and consoles

  • Rapidly debug and triage incidents and user-reported issues

  • Work with Tier 2 and Tier 3 support as required

  • Make valuable contribution to the overall health, performance, and reliability of the networking equipment and Infrastructure Services

  • Develop documentation for Operations processes

  • Work rotating shifts, including weekends and holidays; and overtime as required

What we need to see:

  • BS or diploma the Information Technology field, or equivalent experience

  • 4+ years Site reliability engineering experience working on large scale distributed micro services in a production environment with a real passion for automation and tooling

  • Must be able to operate network devices and pull cables over the racks in a data center environment

  • Physical labor to Rack/Unrack network equipment in data center

  • An expertise with Incident management, organizational change and problem management process. Ability to detection of all service-impacting issues, accurate triage, partner communication, impact containment, service restoration, and post-incident follow-up

  • Tried strengths in problem-solving and root causing issues, while continuously seeking ways to drive optimization, efficiency and the bottom line

  • Experience performing operational activities including batch processing, system backups, maintenance, monitor and provide Level 1 network and server support, monitor and respond to data center environmental alarms, monitor various application systems

  • Experience handling special requests for network configuration changes, system reboots, performing server and network switch reboots, file restores, web updates and terminal messaging

  • Knowledge of TCP/IP networks and troubleshooting tools; Knowledge of Linux operating system and associated tools

  • Able to work a rotating shift schedule that includes days, nights, weekends and holidays as necessary

Ways to stand out from the crowd:

  • Strong networking background along with a strong familiarity with major routing/switching protocols and equipment is a bonus

  • Knowledge of InfiniBand technology

  • Knowledge of water cooling based networking systems

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you.

The base salary range is 58,400 USD - 126,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.


Job Profile


North America


United States


Benefits Equity Equity and benefits Equity and other benefits


Automation Communication Incident Management Linux Linux Operating System Network equipment Networking Networking Support Site Reliability Engineering TCP/IP TCP/IP Networks

  • Contribute to the health and reliability of networking equipment
  • Debug and triage incidents
  • Develop documentation for operations processes
  • Supervise equipment, applications, and processes
  • Work with Tier 2 and Tier 3 support

4+ years


B.S. Diploma Engineering


Must be able to work rotating shifts, weekends, holidays, and overtime


America/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9