FreshRemote.Work

Site Reliability Engineer

United States - Remote

prosource.it is a global IT Managed Service provider working with Medium to Enterprise level, global clients and is looking for an Site Reliability Engineer who is interested in joining a global, enterprise level team who are delivering technical solutions to our internal business partners to drive processes and meet business requirements.

 

We understand that we need exceptional talent to accomplish our mission - therefore we place great emphasis on the people component of IT, and we strive constantly to attract, develop, and retain the best people. We cultivate an ethos and environment within which our people are focused, nurtured, and continually challenged to develop and improve their competencies in a fun and rewarding culture. 

OVERVIEW:

We are seeking a seasoned Site Reliability Engineer (SRE) to join our team. The ideal candidate will have extensive experience in AWS infrastructure, a strong focus on security and reliability, and the ability to guide us through the SOC 2 compliance process. This role requires a proactive individual who can both recommend and implement solutions to ensure the stability and security of our systems. This role requires a proactive individual with a strong bias for action, capable of guiding us through the SOC 2 process and performing the necessary work themselves.

Requirements

  • SOC 2 Compliance: The successful candidate will need to lead and execute the team through their first SOC 2 compliance process.  This means doing the work to get the digital estate ready, working with the SOC 2 auditors and remediating the findings as they come up. 
  • AWS Infrastructure Management: Manage and optimize AWS services, ensuring high availability, reliability and efficiency.
  • System Monitoring and Automation: Develop and implement monitoring solutions to detect and address system issues proactively. Automate critical recovery processes to minimize downtime.
  • Incident Management: Respond to and resolve incidents quickly and effectively, ensuring minimal downtime and user impact.
  • System Design: Participate in system design and architecture to ensure scalability and resilience.
  • Security Focus: Identify and mitigate security vulnerabilities within the infrastructure. Ensure compliance with security best practices.
  • Tooling and Scripting: Utilize tooling to monitor and maintain infrastructure.  Create scripts (BASH, AWS CLI, etc.) where needed for system interrogation, monitoring, and automation.
  • Disaster Recovery: Design and test disaster recovery plans to ensure data integrity and system availability.
  • Documentation: Maintain meticulous documentation of systems, processes, and configurations.

Qualifications:

  • Experience: Minimum of 5 years in a similar role, with at least two different organizations. Experience in well-established companies is preferred.
  • AWS Expertise: Proven experience with AWS services, infrastructure and infrastructure management. Familiarity with AWS security tools and best practices.
  • Security and Compliance: Strong background in security operations (SecOps) and experience with compliance frameworks such as SOC 2.
  • Scripting and Automation: Proficiency in scripting languages and automation tools. Ability to write and maintain scripts for system management and monitoring.
  • Problem-Solving Skills: Strong analytical skills to identify and resolve system issues. Ability to prioritize and address critical components.
  • Communication: Excellent communication skills to collaborate with team members and stakeholders. Ability to explain technical concepts to non-technical audiences.

Additional Skills (Nice to have):

  • CI/CD Pipelines: Knowledge of continuous integration and continuous deployment (CI/CD) processes and tools.
  • System Integration: Experience with integrating various systems and tools to create a cohesive infrastructure.
  • Monitoring Tools: Familiarity with monitoring tools such as CloudWatch and visualizing in Grafana.
  • Kubernetes and Containers: Experience with container management and orchestration.
  • Performance Tuning: Analyze system performance, identify bottlenecks, and implement optimizations to improve efficiency and speed.
  • Capacity Planning: Plan infrastructure for future capacity needs and ensure that systems can handle anticipated workloads.

EMPLOYMENT DETAILS

 

Location:                   Remote (USA Based), 2x per quarter visit to Denver, CO

Model:                        Full-Time 40+ hours/week, working Mountain Time Zone

Start Date:                 1st May 2025

Engagement:           W2 Salary (Exempt from OT)

Salary:                        $110-$130K depending on experience  

Benefits

To all our fulltime staff members, we provide an exceptional benefits package, including medical, dental, vision, long term disability, short term disability, 401k contribution, paid holidays, and PTO.

Applicants for employment in the US must have work authorization that does not, now or in the future, require sponsorship of a visa for employment authorization in the United States. Applicants are also expected to provide references upon request.

Apply

Job Profile

Regions

North America

Countries

United States

Restrictions

Must have work authorization

Benefits/Perks

Dental Long Term Disability Nurturing environment Paid holidays Professional development PTO Rewarding culture Short Term Disability Vision

Tasks
  • Automate recovery processes
  • Develop monitoring solutions
  • Identify security vulnerabilities
  • Lead SOC 2 compliance
  • Maintain documentation
  • Manage AWS infrastructure
  • Participate in system design
  • Respond to incidents
Skills

Automation AWS CI/CD Communication Disaster Recovery Documentation Incident Management Problem-solving Reliability Scripting Security SOC 2 System design System Monitoring

Experience

5 years

Timezones

America/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9