FreshRemote.Work

AI Application & GPU Support Engineer - REMOTE

Atlanta, GA, US

Req ID: 317296 

NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.

We are currently seeking a AI Application & GPU Support Engineer - REMOTE to join our team in Atlanta, Georgia (US-GA), United States (US).

As an AI Platform Specialist, these roles will provide application and GPU support. The team will deliver Tier 1 and Tier 2 support to developers and engineers while collaborating closely with Tier 3 and 4 platform teams and vendors for issue resolution. The roles require user knowledge of Kubernetes, virtualization, and cloud-native technologies as well as operator knowledge of GPUs and other AI supporting services.  Each specialist should have a focus on customer service along with goals of reliability, scalability, and performance.

 

Day to Day Responsibilities Include:

  • Platform Support & Incident Response
    • Provide Tier 1 & Tier 2 support for AI-driven applications and workloads.
    • Troubleshoot and resolve issues related to GPU utilization, and service performance.
    • Collaborate with Tier 3+ teams, including Kubernetes engineers and external vendors, to escalate and resolve complex issues.
  • GPU Infrastructure & AI Services Management
    • Optimize and support GPU-enabled workloads including CUDA and other AI acceleration frameworks.
    • Assist in the installation, configuration, and support of AI coding assistants (e.g., Codeium).
  • Observability & Documentation
    • Maintain detailed operational documentation, runbooks, and troubleshooting guides.
    • Utilize monitoring/logging tools like New Relic, Big Panda, Prometheus, Grafana, and other observability frameworks.
  • Process Improvement & Collaboration
    • Work cross-functionally with developers, IT teams, and vendors to ensure seamless deployment and support of AI services.
    • Contribute to CI/CD pipelines, automation, service, and security best practices.
    • Track and communicate work through task management platforms (ServiceNow and Jira).

Minimum Requirements:

  • 5+ Years in Hybrid Cloud – In-depth knowledge of private (on-premises) and public (GCP & AWS) cloud architectures and services.
  • 3+ Years AI/ML Software – Developer experience with DevOps practices (Git, Jenkins, etc.) as well as working with AI/ML engineers and data scientists.
  • 3+ Years AI/ML Hardware – Experience deploying, supporting, and optimizing on-premises and cloud GPUs (NVIDIA & AMD) enabled infrastructure (VMs & Containers).
  • 3+ Years Experience with GPU orchestration tools like Run:AI, NVIDIA AI Enterprise, VMWare Private AI Foundation, etc.
  • 1+ years exposure to AI coding assistants like Codeium, Copilot, or Tabnine.

Preferences:

  • Proficient in development tools like Python, PyTorch, TensorFlow, Jupyter Notebooks, etc.
  • Technical Support & Troubleshooting – Proven ability to diagnose and resolve customer and platform issues in production environments.
  • Strong Communication & Documentation – Ability to clearly document procedures, write knowledge base articles, and collaborate with customers and teams.
  • Time Management & Accountability – Ability to work independently, prioritize tasks, and manage workload effectively.

Where required by law, NTT DATA provides a reasonable range of compensation for specific roles. The starting pay range for this remote role is $109,275 - $227,656. This range reflects the minimum and maximum target compensation for the position across all US locations. Actual compensation will depend on a number of factors, including the candidate’s actual work location, relevant experience, technical skills, and other qualifications.

 

INDHCLSMC

 

About NTT DATA

NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at us.nttdata.com

NTT DATA endeavors to make https://us.nttdata.com accessible to any and all users. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-usThis contact information is for accommodation requests only and cannot be used to inquire about the status of applications. NTT DATA is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status. For our EEO Policy Statement, please click here. If you'd like more information on your EEO rights under the law, please click here. For Pay Transparency information, please click here.

Apply

Job Profile

Regions

North America

Countries

United States

Restrictions

Remote role

Benefits/Perks

Growth Opportunities Inclusive environment Remote work

Tasks
  • Collaborate
  • Collaborate with tier 3+ teams
  • Communication
  • Configuration
  • Contribute to CI/CD pipelines
  • Documentation
  • Implementation
  • Maintain operational documentation
  • Optimize GPU workloads
  • Process Improvement
  • Provide tier 1 & tier 2 support
  • Resolve issues
  • Support
  • Troubleshoot GPU issues
  • Troubleshooting
  • Work cross-functionally
Skills

AI AI acceleration frameworks AI/ML AMD Applications Artificial Intelligence Automation AWS Big Panda CI/CD Click Cloud Cloud Native technologies Coding Collaboration Communication Consulting CUDA Customer service DevOps Documentation GCP Git GPU support Grafana Implementation Incident Response Jenkins Jira Jupyter notebooks Kubernetes Logging Monitoring New Relic NVIDIA Process Improvement Prometheus Python PyTorch R Run Security Security Best Practices ServiceNow Teams Technical Support TensorFlow Time Management Troubleshooting Virtualization

Experience

5 years

Education

Business IT Management

Certifications

AWS VMware

Timezones

America/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9