Staff Software Engineer - Infrastructure Monitoring
Boston, MA, United States, New York, NY, United States, Remote
Datadog is seeking an experienced Staff Engineer to join our Infrastructure Monitoring team. We are looking for a Staff Engineer with deep GPU experience (development + operations) to help build out GPU-specific observability capabilities in our Infrastructure Monitoring products. This role will directly shape Datadog’s approach and posture towards building observability tooling for customers leveraging GPUs in their infrastructure. Example problems this person will solve are “How can we detect runtime issues over a fleet of GPUs, isolate the root cause, and provide actionable recommendations to resolve the issue?” and “How can we profile and optimize software running on GPUs?” This will include significant cross teamwork and collaboration with a number of Datadog product and platform teams, requiring the ability to go deep across many different product stacks.
What You'll Do:
- Develop a company-wide approach to GPU Observability across the 3 Pillars - Metrics, Logs, and Traces
- Collaborate with cross-functional teams to design and develop GPU-centric product offerings
- Drive high-priority, high-visibility products that expand Datadog’s penetration into the GPU market
- Lead architectural decisions for new and existing GPU-based observability products
- Identify opportunities for Datadog product enhancements to provide coverage for GPUs
- Contribute to short- and long-term planning and roadmap development
Who You Are:
- You have several years of experience leading cross-team initiatives in a platform or infrastructure-focused environment
- You have a deep understanding of, have developed for, and operated GPUs in production environments
- You are deeply familiar with at least one of the following areas - Data Science, Graphics Programming, Large Language Models
- You have significant back-end programming experience and have architected, built, and operated distributed systems to solve problems at high scale
- You possess a deep understanding of the day-to-day responsibilities of an engineer and have a strong technical background
- You have excellent verbal and written communication skills and are comfortable presenting and defending your ideas to both technical and non-technical audiences
- You have a BS/MS/PhD in a Computer Science, Engineering or related scientific field or equivalent experience
Datadog offers a competitive salary and equity package, and may include variable compensation. Actual compensation is based on factors such as the candidate's skills, qualifications, and experience. In addition, Datadog offers a wide range of best in class, comprehensive and inclusive employee benefits for this role including healthcare, dental, parental planning, and mental health benefits, a 401(k) plan and match, paid time off, fitness reimbursements, and a discounted employee stock purchase plan.
The reasonably estimated yearly salary for this role at Datadog is:$234,000—$300,000 USDAbout Datadog:
Datadog (NASDAQ: DDOG) is a global SaaS business, delivering a rare combination of growth and profitability. We are on a mission to break down silos and solve complexity in the cloud age by enabling digital transformation, cloud migration, and infrastructure monitoring of our customers’ entire technology stacks. Built by engineers, for engineers, Datadog is used by organizations of all sizes across a wide range of industries. Together, we champion professional development, diversity of thought, innovation, and work excellence to empower continuous growth. Join the pack and become part of a collaborative, pragmatic, and thoughtful people-first community where we solve tough problems, take smart risks, and celebrate one another. Learn more about #DatadogLife on Instagram, LinkedIn, and Datadog Learning Center.
Equal Opportunity at Datadog:
Datadog is an Affirmative Action and Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. Here are our Candidate Legal Notices for your reference.
Your Privacy:
Any information you submit to Datadog as part of your application will be processed in accordance with Datadog’s Applicant and Candidate Privacy Notice.
ApplyJob Profile
401(k) Plan 401k plan and match Competitive salary and equity Competitive salary and equity package Dental Discounted employee stock purchase plan Employee stock purchase plan Fitness reimbursements Health benefits Healthcare Mental health benefits Paid Time Off Parental Planning Professional development Stock Purchase Plan Variable Compensation
Tasks- Collaborate on GPU-centric products
- Contribute to planning and roadmap
- Develop GPU observability approach
- Drive high-priority GPU products
- Identify product enhancement opportunities
- Lead architectural decisions
Back-end Programming Cloud Collaboration Communication Datadog Data Science Design Distributed Systems Go GPU Development Graphics Programming Infrastructure Infrastructure monitoring Large Language Models Legal Monitoring Observability Programming SaaS
Experience5 years
EducationB.S. Business Computer Science Engineering MS Ph.D. Related scientific field Technology
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9