Site Reliability Engineer

United States - Remote

prosource.it

USD 130K+ Full Time Senior

Company preview All jobs at prosource.it

Apply Job profile

Published 6 days ago

prosource.it is a global IT Managed Service provider working with Medium to Enterprise level, global clients and is looking for an Site Reliability Engineer who is interested in joining a global, enterprise level team who are delivering technical solutions to our internal business partners to drive processes and meet business requirements.

We understand that we need exceptional talent to accomplish our mission - therefore we place great emphasis on the people component of IT, and we strive constantly to attract, develop, and retain the best people. We cultivate an ethos and environment within which our people are focused, nurtured, and continually challenged to develop and improve their competencies in a fun and rewarding culture.

OVERVIEW:

We are seeking a seasoned Site Reliability Engineer (SRE) to join our team. The ideal candidate will have extensive experience in AWS infrastructure, a strong focus on security and reliability, and the ability to guide us through the SOC 2 compliance process. This role requires a proactive individual who can both recommend and implement solutions to ensure the stability and security of our systems. This role requires a proactive individual with a strong bias for action, capable of guiding us through the SOC 2 process and performing the necessary work themselves.

Requirements

SOC 2 Compliance: The successful candidate will need to lead and execute the team through their first SOC 2 compliance process. This means doing the work to get the digital estate ready, working with the SOC 2 auditors and remediating the findings as they come up.
AWS Infrastructure Management: Manage and optimize AWS services, ensuring high availability, reliability and efficiency.
System Monitoring and Automation: Develop and implement monitoring solutions to detect and address system issues proactively. Automate critical recovery processes to minimize downtime.
Incident Management: Respond to and resolve incidents quickly and effectively, ensuring minimal downtime and user impact.
System Design: Participate in system design and architecture to ensure scalability and resilience.
Security Focus: Identify and mitigate security vulnerabilities within the infrastructure. Ensure compliance with security best practices.
Tooling and Scripting: Utilize tooling to monitor and maintain infrastructure. Create scripts (BASH, AWS CLI, etc.) where needed for system interrogation, monitoring, and automation.
Disaster Recovery: Design and test disaster recovery plans to ensure data integrity and system availability.
Documentation: Maintain meticulous documentation of systems, processes, and configurations.

Qualifications:

Experience: Minimum of 5 years in a similar role, with at least two different organizations. Experience in well-established companies is preferred.
AWS Expertise: Proven experience with AWS services, infrastructure and infrastructure management. Familiarity with AWS security tools and best practices.
Security and Compliance: Strong background in security operations (SecOps) and experience with compliance frameworks such as SOC 2.
Scripting and Automation: Proficiency in scripting languages and automation tools. Ability to write and maintain scripts for system management and monitoring.
Problem-Solving Skills: Strong analytical skills to identify and resolve system issues. Ability to prioritize and address critical components.
Communication: Excellent communication skills to collaborate with team members and stakeholders. Ability to explain technical concepts to non-technical audiences.

Additional Skills (Nice to have):

CI/CD Pipelines: Knowledge of continuous integration and continuous deployment (CI/CD) processes and tools.
System Integration: Experience with integrating various systems and tools to create a cohesive infrastructure.
Monitoring Tools: Familiarity with monitoring tools such as CloudWatch and visualizing in Grafana.
Kubernetes and Containers: Experience with container management and orchestration.
Performance Tuning: Analyze system performance, identify bottlenecks, and implement optimizations to improve efficiency and speed.
Capacity Planning: Plan infrastructure for future capacity needs and ensure that systems can handle anticipated workloads.

EMPLOYMENT DETAILS

Location: Remote (USA Based), 2x per quarter visit to Denver, CO

Model: Full-Time 40+ hours/week, working Mountain Time Zone

Start Date: 1st May 2025

Engagement: W2 Salary (Exempt from OT)

Salary: $110-$130K depending on experience

Benefits

To all our fulltime staff members, we provide an exceptional benefits package, including medical, dental, vision, long term disability, short term disability, 401k contribution, paid holidays, and PTO.

Applicants for employment in the US must have work authorization that does not, now or in the future, require sponsorship of a visa for employment authorization in the United States. Applicants are also expected to provide references upon request.

Apply