Staff Software Engineer - Infrastructure Monitoring
Boston, MA, United States, New York, NY, United States, Remote
Datadog is seeking an experienced Staff Engineer to join our Infrastructure Monitoring team. We are looking for a Staff Engineer with deep GPU experience (development + operations) to help build out GPU-specific observability capabilities in our Infrastructure Monitoring products. This role will directly shape Datadog’s approach and posture towards building observability tooling for customers leveraging GPUs in their infrastructure. Example problems this person will solve are “How can we detect runtime issues over a fleet of GPUs, isolate the root cause, and provide actionable recommendations to resolve the issue?” and “How can we profile and optimize software running on GPUs?” This will include significant cross teamwork and collaboration with a number of Datadog product and platform teams, requiring the ability to go deep across many different product stacks.
What You'll Do:
- Develop a company-wide approach to GPU Observability across the 3 Pillars - Metrics, Logs, and Traces
- Collaborate with cross-functional teams to design and develop GPU-centric product offerings
- Drive high-priority, high-visibility products that expand Datadog’s penetration into the GPU market
- Lead architectural decisions for new and existing GPU-based observability products
- Identify opportunities for Datadog product enhancements to provide coverage for GPUs
- Contribute to short- and long-term planning and roadmap development
Who You Are:
- You have several years of experience leading cross-team initiatives in a platform or infrastructure-focused environment
- You have a deep understanding of, have developed for, and operated GPUs in production environments
- You are deeply familiar with at least one of the following areas - Data Science, Graphics Programming, Large Language Models
- You have significant back-end programming experience and have architected, built, and operated distributed systems to solve problems at high scale
- You possess a deep understanding of the day-to-day responsibilities of an engineer and have a strong technical background
- You have excellent verbal and written communication skills and are comfortable presenting and defending your ideas to both technical and non-technical audiences
- You have a BS/MS/PhD in a Computer Science, Engineering or related scientific field or equivalent experience
Datadog offers a competitive salary and equity package, and may include variable compensation. Actual compensation is based on factors such as the candidate's skills, qualifications, and experience. In addition, Datadog offers a wide range of best in class, comprehensive and inclusive employee benefits for this role including healthcare, dental, parental planning, and mental …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
401(k) Plan 401k plan and match Competitive salary and equity Competitive salary and equity package Dental Discounted employee stock purchase plan Employee stock purchase plan Fitness reimbursements Health benefits Healthcare Mental health benefits Paid Time Off Parental Planning Professional development Stock Purchase Plan Variable Compensation
Tasks- Lead architectural decisions
Back-end Programming Cloud Collaboration Communication Datadog Data Science Design Distributed Systems Go Infrastructure Infrastructure monitoring Large Language Models Legal Monitoring Observability Programming SaaS
Experience5 years
EducationB.S. Business Computer Science Engineering MS Ph.D. Related scientific field Technology
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9