FreshRemote.Work

Sr Software Engineer – Infrastructure, Telemetry and Site Reliability Engineer - *Remote*

Irvine, CA, United States

About the Role

We are seeking a skilled Sr Software Engineer – Infrastructure Telemetry and Site Reliability Engineer (SRE) to join our dynamic platform team. The ideal candidate will be responsible for ensuring the reliability, availability, and performance of our systems while leveraging telemetry data to enhance monitoring and observability. This role is critical in maintaining our high service standards and continuously improving our infrastructure.

Key Responsibilities

  • Lead the design, develop, and implement monitoring, logging, and alerting solutions to ensure system reliability and performance.
  • Utilize telemetry data to identify and troubleshoot issues, optimize system performance, and enhance overall observability.
  • Collaborate with development and operations teams to ensure seamless integration of monitoring and alerting tools.
  • Write and maintain scripts for infrastructure management and automation (e.g., Python, PowerShell, Bash). 
  • Automate repetitive tasks to improve efficiency and reduce manual intervention.
  • Automate deployment pipelines using CI/CD tools such as Jenkins, GitHub Actions, or Azure DevOps.
  • Participate in on-call rotations and incident response, providing timely resolution to system outages and performance issues.
  • Develop and maintain documentation for system architecture, processes, and procedures related to telemetry and site reliability.
  • Design and implementation of cloud infrastructure using Infrastructure as Code (IaC) tools such as Terraform, AWS CloudFormation, or Azure Resource Manager.
  • Collaborate with cross-functional teams to design and implement scalable and resilient infrastructure solutions.
  • Conduct root cause analysis of incidents and implement corrective actions to prevent recurrence.
  • Drive the adoption of best practices in site reliability engineering and telemetry within the organization.

 

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
  • 5+ years of experience in software engineering with a focus on site reliability engineering, DevOps, IaC and Cloud Infrastructure or a related field.
  • Strong knowledge of monitoring, logging, and alerting tools (e.g., Datadog, Prometheus, Grafana, ELK stack, Splunk, New Relic).
  • Proficiency in programming and scripting languages (e.g., Python, Go, Bash).
  • Experience with cloud platforms (e.g., AWS, Azure, Google Cloud) and containerization technologies (e.g., Docker, Kubernetes).
  • Strong understanding of Linux/Unix systems and networking concepts.
  • Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
  • Experience with configuration management and automation tools (e.g., Terraform, Ansible, Puppet, Chef).
  • Strong communication and collaboration skills, with the ability to work effectively in a team-oriented environment.
  • Familiarity with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, CircleCI) is a plus.

 

Preferred Qualifications

  • Experience with site reliability engineering practices and principles, such as error budgets and service level objectives (SLOs).
  • Knowledge of data analytics and the ability to interpret and visualize telemetry data.
  • Experience with incident management and post-incident analysis.
  • Understanding of IT security best practices and tools.
  • Strong problem-solving skills and attention to detail.
  • Effective communication and collaboration skills.

Salary Range by Location:  

AK: Anchorage: Min:$53.90, Max: $91.78 

AK: Kodiak, Seward, Valdez: Min:$56.19, Max: $95.67 

California: Humboldt: Min:$56.19, Max: $95.67 

California: All Northern California - Except Humboldt: Min: $63.04, Max: $107.34 

California: All Southern California - Except Bakersfield: Min: $56.19, Max: $95.67 

California: Bakersfield: $53.90, Max: $91.78 

Idaho: Min: $47.96, Max: $81.67 

Montana: Except Great Falls: Min: $43.40, Max: $73.89 

Montana: Great Falls: Min:$41.11, Max: $70.00 

New Mexico: Min: $43.40, Max: $73.89 

Nevada: Min:$56.19, Max: $95.67 

Oregon: Non-Portland Service Area: Min:$50.25, Max: $85.56 

Oregon: Portland Service Area: $53.90, Max: $91.78 

Texas: Min:$41.11, Max: $70.00 

Washington: Western - Except Tukwila: $56.19, Max: $95.67 

Washington: Southwest - Olympia, Centralia & Below: $53.90, Max: $91.78 

Washington: Tukwila: $56.19, Max: $95.67 

Washington: Eastern: $47.96, Max: $81.67 

Washington: South Eastern: Min:$50.25, Max: $85.56 

 

Why Join Providence? 

Our best-in-class benefits are uniquely designed to support you and your family in staying well, growing professionally, and achieving financial security. We take care of you, so you can focus on delivering our Mission of caring for everyone, especially the most vulnerable in our communities.

Accepting a new position at another facility that is part of the Providence family of organizations may change your current benefits. Changes in benefits, including paid time-off, happen for various reasons. These reasons can include changes of Legal Employer, FTE, Union, location, time-off plan policies, availability of health and welfare benefit plan offerings, and other various reasons.

At Providence, our strength lies in Our Promise of “Know me, care for me, ease my way.” Working at our family of organizations means that regardless of your role, we’ll walk alongside you in your career, supporting you so you can support others. We provide best-in-class benefits and we foster an inclusive workplace where diversity is valued, and everyone is essential, heard and respected. Together, our 120,000 caregivers (all employees) serve in over 50 hospitals, over 1,000 clinics and a full range of health and social services across Alaska, California, Montana, New Mexico, Oregon, Texas and Washington. As a comprehensive health care organization, we are serving more people, advancing best practices and continuing our more than 100-year tradition of serving the poor and vulnerable.

The amounts listed are the base pay range; additional compensation may be available for this role, such as shift differentials, standby/on-call, overtime, premiums, extra shift incentives, or bonus opportunities.

Providence offers a comprehensive benefits package including a retirement 401(k) Savings Plan with employer matching, health care benefits (medical, dental, vision), life insurance, disability insurance, time off benefits (paid parental leave, vacations, holidays, health issues), voluntary benefits, well-being resources and much more. Learn more at providence.jobs/benefits.

Apply

Job Profile

Regions

North America

Countries

United States

Restrictions

CA California Montana OR Oregon Texas Washington

Benefits/Perks

Best-in-class benefits Collaboration Comprehensive benefits package Financial Security Health care benefits Inclusive workplace Paid parental leave Well-being resources

Tasks
  • Automate tasks and deployment pipelines
  • Collaborate with teams for integration
  • Conduct root cause analysis
  • Design and implement monitoring solutions
  • Design cloud infrastructure
  • Develop documentation
  • Drive best practices in site reliability
  • Maintain documentation
  • Participate in on-call rotations
  • Utilize telemetry data for troubleshooting
  • Write and maintain scripts
Skills

Alerting tools Analysis Analytics Ansible Automation AWS CloudFormation Azure DevOps Azure Resource Manager Bash Best Practices Chef CI/CD CI/CD pipelines CircleCI Cloud Cloud Infrastructure Collaboration Communication Computer Science Configuration Management Data & Analytics Datadog DevOps Diversity Docker Documentation ELK stack Engineering GitHub Actions GitLab CI Go Grafana Health care Infrastructure as Code IT IT Security Jenkins Kubernetes Linux Logging tools Monitoring tools Networking New Relic Operations Organization PowerShell Problem-solving Prometheus Puppet Python Root Cause Analysis Scripting Languages Security Best Practices Site Reliability Engineering Software Engineering Splunk Terraform UNIX

Experience

5 years

Education

Analytics Bachelor's degree Computer Science Data Analytics Design Engineering Equivalent experience Insurance Related Field

Certifications

Teams

Timezones

America/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9