Site Reliability Engineer
Remote, United States
Overview
The Site Reliability Engineer (SRE) is responsible for leading the continuous evolution of the capabilities needed to ensure the reliable delivery and operation of the software solutions which enable Cotiviti’s ability to retrieve medical records from healthcare providers.
The SRE works closely with architects, development teams, production operations, and product owners to enable the appropriate level of reliability to meet business objectives. The SRE serves as a mentor and role model—providing thought-leadership and collaborating with a cross-functional team to drive the continuous improvement of SDLC and production operations. They improve reliability by focusing on monitoring, productivity, performance, and availability.
The SRE has three primary areas of responsibility:
- Operations: emergency incident response; change management; infrastructure management
- System support: ensure system stability; production operations enablement
- Process improvement: post-incident reviews; improve software development, deployment, and release practices; improve support practices; recommend changes to solution architecture
Collaborating with stakeholders, the SRE: defines business-aligned Service Level Indicators and Objectives; implements capabilities which real-time insight into the health of applications and the development pipelines; implements process and technology changes; automates routine SDLC tasks.
The SRE possesses a deep understanding of AWS cloud-native services; they ensure the team employs the correct strategies and tactics to ensure the reliability of the applications and services operating on AWS.
Responsibilities
- Ability to translate functional and nonfunctional requirements and strategies into solution reliability strategy, architecture, and roadmap in collaboration with development team members and other architects.
- Ability to define key business-value aligned Service Level Indicators and Objectives. Automate SLIs/SLOs through observability tools.
- Ability to lead data-driven improvement in reliability of the software solution.
- Ability to apply SRE principles and practices to solutions built using AWS cloud-native services, such as but not limited to:
- API Gateways
- Lambda functions built using NestJS/NodeJS
- Datastores (DynamoDB, OpenSearch, RDS, s3, HealthLake)
- Event messaging technologies (SQS, EventBridge, Kinesis)
- Logging/Tracing (CloudWatch, X-Ray)
- Infrastructure as Code (Terraform)
- Ability to drive continuous process and technology improvements to increase the reliability of deployments and releases
- Coaching/training development team members as necessary to drive improvements in the teams’ delivery of the solution.
- Support the continuous evolution of best practices and standards for solution reliability
- Complete all responsibilities as outlined on annual Performance Plan.
Qualifications
- Proven record of accomplishment of applying SRE principles and practices to drive reliable software delivery and operation
- Self-starter with a passion for delivering reliable, mission-critical solutions which delight customers
- Expert in applying process improvement methodologies (Lean, Six Sigma, Kaizen, etc.) to software engineering practices
- Bachelor’s degree in Computer Science, Information Technology or related field, or equivalent work experience
- 10+ years of experience in at least two IT disciplines (such as data/solution architecture, Technical/Infrastructure architecture, Information/Data Architecture & Business Architecture) in a multitier enterprise environment.
- 5+ years recent experience leading the implementation of SRE in support of large development teams
- 5+ years hands-on experience implementing site reliability engineering practices; expert with tools and technologies used to improve software reliability at scale
- 10+ years working in an Agile model, SAFe preferred
- Prior hands-on experience with greenfield software development
- Ability to apply data-driven decision making when evaluating architecture alternatives, balancing cost, complexity, time-to-market, and other factors
- Basic knowledge of financial models and budgeting
- Strong problem solving and critical thinking skills
- Exceptional interpersonal skills including teamwork, facilitation, coaching, and negotiation
- Excellent written and verbal communication skills
- Strong leadership skills
Mental Requirements:
- Communicating with others to exchange information.
- Assessing the accuracy, neatness, and thoroughness of the work assigned.
Physical Requirements and Working Conditions:
- Remaining in a stationary position, often standing or sitting for prolonged periods.
- Communicating with others to exchange information.
- Repeating motions that may include the wrists, hands, and/or fingers.
- Assessing the accuracy, neatness, and thoroughness of the work assigned.
- No adverse environmental conditions are expected.
- Must be able to provide a dedicated, secure work area.
- Must be able to provide high-speed internet access/connectivity and office setup and maintenance.
Base compensation ranges from $140,000 to $170,000. Specific offers are determined by various factors, such as experience, education, skills, certifications, and other business needs.
This role is eligible for discretionary bonus consideration
Cotiviti offers team members a competitive benefits package to address a wide range of personal and family needs, including medical, dental, vision, disability, and life insurance coverage, 401(k) savings plans, paid family leave, 9 paid holidays per year, and 17-27 days of Paid Time Off (PTO) per year, depending on specific level and length of service with Cotiviti. For information about our benefits package, please refer to our Careers page.
Since this job will be based remotely, all interviews will be conducted virtually.
Date of posting: 12/13/2024
Applications are assessed on a rolling basis. We anticipate that the application window will close on 03/12/2025, but the application window may change depending on the volume of applications received or close immediately if a qualified candidate is selected.
#LI-Remote
#LI-RA1
Apply
Job Profile
Must be able to provide a dedicated, secure work area Must be able to provide high-speed internet access
Benefits/Perks9 paid holidays per year Competitive benefits package Dental Disability Discretionary bonus Life Insurance Life insurance coverage Medical Medical, dental, vision, disability, and life insurance coverage Paid Family Leave Paid holidays Paid Time Off PTO Vision
Tasks- Automate SLIs/SLOs
- Coaching
- Complete all responsibilities as outlined
- Drive process improvements
- Excellent written and verbal communication
- Lead reliability strategy
- Negotiation
- Problem solving
- Training
Access Agile API Api gateways AWS Best Practices Budgeting Change Management Cloud CloudWatch Coaching Collaboration Communication Computer Computer Science Continuous Improvement Critical thinking Data Architecture Decision making Development DynamoDB EventBridge Exchange Healthcare HealthLake Information Technology Infrastructure Architecture Infrastructure Management Insurance Interpersonal IT Kaizen Kinesis Lambda Leadership Lean Medical Records Monitoring Negotiation Nest.js Node.js Opensearch Operations Process Improvement RDS S3 SAFe Science SDLC Site Reliability Engineering Six Sigma Software Development Software Engineering Solution Architecture SQS Teams Teamwork Technology Terraform Training Training Development Verbal communication X-ray
Experience10 years
EducationBachelor's degree Business Computer Science Engineering Equivalent Equivalent work experience Health Healthcare Information Technology IT Management Medical Operations Related Field Software Engineering
Certifications TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9