FreshRemote.Work

Senior Observability Engineer

Remote Location, United States

TriNet is a leading provider of comprehensive human resources solutions for small to midsize businesses (SMBs). We enhance business productivity by enabling our clients to outsource their HR function to one strategic partner and allowing them to focus on operating and growing their core businesses. Our full-service HR solutions include features such as payroll processing, human capital consulting, employment law compliance and employee benefits, including health insurance, retirement plans and workers’ compensation insurance. 

TriNet has a nationwide presence and an experienced executive team. Our stock is publicly traded on the NYSE under the ticker symbol TNET. If you’re passionate about innovation and making an impact on the large SMB market, come join us as we power our clients’ business success with extraordinary HR.

Don't meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single requirement. At TriNet, we are dedicated to building a diverse, inclusive and authentic workplace, so if you're excited about this role but your past experience doesn't align perfectly with every single qualification in the job description, we encourage you to apply anyways. You may just be the right candidate for this or other roles. 

JOB SUMMARY
The role of a Senior Observability Engineer is to design, implement, and maintain comprehensive observability solutions for complex systems and applications. This position requires a deep understanding of monitoring and observability practices, as well as expertise in using various tools and technologies to collect and analyze performance, logging, and metrics data.

Essential Duties/Responsibilities

  • Monitoring Setup and Configuration: Set up and configure the monitoring tools to collect data from various systems, applications, and network components. This involves defining monitoring metrics, configuring data collection agents or agents, and ensuring proper connectivity and access.
  • Alert Management: Monitor alerts generated by the tools and perform triage to identify critical issues. Analyze alert patterns, fine-tune alert thresholds, and configure alert escalation workflows to ensure timely response and resolution.
  • Performance Analysis and Troubleshooting: Utilize the tools' features and functionalities to analyze performance metrics, logs, and traces. Conduct investigations and root cause analysis to troubleshoot and resolve performance issues, identifying bottlenecks and areas for optimization.
  • Incident Response: Collaborate with cross-functional teams to respond to and resolve incidents in a timely manner. Engage in incident management processes, including incident triage, communication, and coordination with relevant stakeholders, and participate in post-incident reviews to identify areas for improvement.
  • Dashboard and Visualization: Create and maintain dashboards and visualizations using tools like Grafana, providing a consolidated view of system health, performance, and key metrics. Customize dashboards to meet specific business and operational requirements and share them with relevant teams and stakeholders.
  • Capacity Planning and Scalability: Monitor resource utilization and performance trends to forecast capacity requirements. Collaborate with capacity planning teams to plan and provision resources based on anticipated growth and workload patterns, ensuring scalability and optimal performance.
  • Tool Administration and Maintenance: Perform routine administration tasks for the observability tools, such as user management, access control, and system upgrades or patching. Monitor the health and availability of the tools themselves, ensuring their reliability and functionality.
  • Documentation and Knowledge Sharing: Document monitoring configurations, troubleshooting procedures, and best practices for future reference. Contribute to internal knowledge bases and collaborate with the team to share insights and lessons learned.
  • Tool Integration and Automation: Integrate observability tools with other systems and workflows, such as ticketing systems, incident management platforms, and automation frameworks. Automate monitoring configurations, data collection, and reporting processes to improve efficiency and reduce manual effort.
  • Continuous Improvement and Research: Stay updated with the latest developments in observability practices and technologies. Research and evaluate new tools and techniques that could enhance the monitoring and observability capabilities of the organization. Continuously improve existing monitoring setups, workflows, and processes to align with industry best practices.
  • Performs other duties as assigned
  • Complies with all policies and standards
     

QUALIFICATIONS

Education

  • Bachelor's Degree in computer science or other highly technical, scientific subject area preferred  

Work Experience

  • Typically 5+ years experience with systems engineering and/or information technology        
     

Knowledge, Skills and Abilities

  • Demonstrate knowledge and experience administering application, cloud infrastructure monitoring.   
  • Hands-on experience on Prometheus & Grafana    
  • Hands-on experience on Elasticsearch (AWS OpenSearch) & Oracle Logging Analytics or similar tools like Datadog, Splunk, Sumo Logic    
  • Hands-on experience on APM tool AppDynamics or similar tools like Dynatrace, New Relic   
  • Scripting Language experience (Python preferred)    
  • Strong understanding of web services and swagger is a plus.    
  • Experience with CI/CD pipelines    
  • Attitude to thrive in a fun, fast-paced environment.    
  • Ability to excel at problem solving, adapt easily to change, and contribute effectively both individually and as part of cross-functional teams.    
  • Proficiency in Infrastructure as Code (IaC), particularly CDK and Terraform, is highly desirable.   
  • Passion for DevOps, Application/API monitoring, automation, and reliability    
     

Work Environment:

  • Work in clean, pleasant, and comfortable home or office setting. The work environment characteristics described here are representative of those an employee encounters while performing the essential functions of this job. Reasonable accommodations may be made to enable persons with disabilities to perform the essential functions. 
  • Position may be considered remote and require reliable and consistent internet service.
     

Travel Requirements
Minimal

The salary range for this role is $76,000 to $182,400. The candidate’s final salary offer will be based on the candidate’s skills, education, work location and experience.

A candidate’s compensation may also include bonuses consistent with TriNet’s corporate bonus plan.

Additionally, subject to applicable eligibility requirements, TriNet offers permanent full-time employees a variety of benefits including medical, dental, and vision plans, life and disability insurance, a 401(K) savings plan, an employee stock purchase plan, eleven (11) Company observed holidays, PTO and a comprehensive leave program.  Please click the following link for detailed information about our benefits offerings:  https://www.trinet.com/documents/blt5b61a1040aae1904  

Please Note: TriNet reserves the right to change or modify job duties and assignments at any time. The above job description is not all encompassing. Position functions and qualifications may vary depending on business necessity.

TriNet is an Equal Opportunity Employer and does not discriminate against applicants based on race, religion, color, disability, medical condition, legally protected genetic information, national origin, gender, sexual orientation, marital status, gender identity or expression, sex (including pregnancy, childbirth or related medical conditions), age, veteran status or other legally protected characteristics. Any applicant with a mental or physical disability who requires an accommodation during the application process should contact recruiting@trinet.com to request such an accommodation. 

TriNet is a leading provider of comprehensive human resources solutions for small to midsize businesses (SMBs). We enhance business productivity by enabling our clients to outsource their HR function to one strategic partner and allowing them to focus on operating and growing their core businesses. Our full-service HR solutions include features such as payroll processing, human capital consulting, employment law compliance and employee benefits, including health insurance, retirement plans and workers’ compensation insurance.
TriNet has a nationwide presence and an experienced executive team. Our stock is publicly traded on the NYSE under the ticker symbol TNET. If you’re passionate about innovation and making an impact on the large SMB market, come join us as we power our clients’ business success with extraordinary HR.
Please note that at this time, TriNet requires colleagues reporting to TriNet offices, engaging in in-person activities (including off-sites) or engaging in TriNet sponsored business travel, to be fully vaccinated (as defined by the CDC) against COVID-19 or provide proof of a negative PCR test each week.  TriNet will consider requests for reasonable accommodations for documented medical reasons and for sincerely held religious beliefs in accordance with applicable law.   TriNet is providing access to a mobile app for colleagues to submit proof of vaccination or negative test results.  Please do not include proof of vaccine status or any indication of a possible request for an accommodation when submitting your application materials. If applicable, TriNet will follow up with you directly to request proof of vaccination and to discuss any potential accommodations. Apply