Site Reliability Engineer
Remote
The SRE will be assisting in the responsibilities for supporting, enhancing, and maintaining our infrastructure and cloud services. Qualified candidates will demonstrate immediate technical aptitude, as well as propensity for learning new tools and techniques quickly in a fast-paced environment. Excellent candidates will be responsible for collaborating with the devops and development teams on efforts to help sustain a healthy responsive system. The SRE team is the front line for supporting our system and developing a best-in-class monitoring platform. The candidate will propose enhancements for system health, performance, and reliability to deliver SaaS based services for Restaurant365 customers.
How you'll add value:
- Responding to production incidents and determining how we can prevent them in the future.
- Triaging and troubleshooting production issues to ensure reliability and performance.
- Identifying and automating manual processes.
- Continuously evolving our monitoring tools and platform.
- Promoting and applying best practices for building scalable and reliable services across engineering.
- Developing and maintaining technical documentation/diagrams, runbooks, and procedures.
- Provide “Always On” support for a 24x7 online environment, by participating in an on-call rotation providing response to production incidents and participating in root cause analysis and problem management.
- Automate Public cloud environments by utilizing tools such as Terraform, Ansible, and cloud formation.
- Work within strict time frames following change management protocols to provide maximum uptime.
- Implement, review, and adhere to security policies along with working with audit teams.
- Research and remediate system vulnerabilities.
- Interact and coordinate with architects, developers, vendors, and internal business partners.
- Maintain documentation of all Cloud infrastructure related components.
- Maintain a solid working knowledge of current infrastructure and future trends.
- Other duties as assigned.
What you'll need to be successful in this role:
- Extensive experience with SRE methodologies and processes.
- Automation expert with coding skills and a mindset to automate manual/repetitive tasks with PowerShell, Bash, Perl, PHP, or containers.
- Extensive scripting experience with Terraform, YAML, Ansible, Python.
- Automation experience in public cloud environments, with a strong understanding of infrastructure as code.
- Experience in continuous deployment and lifecycle management using tools such as Gitlab, Git, stash.
- Linux engineering skills and working knowledge of Windows.
- Working experience with Nginx and Apache Tomcat.
- Azure or AWS: 2+ years hands on administration and automation of various Azure or AWS services (Azure AKS, Azure Functions, Azure Blob, AWS ECS, AWS EKS, LAMDA, S3, ALB/ELB, etc...).
- Experience with Windows and Linux.
- Ability to effectively prioritize and execute tasks in a high velocity environment.
- Minimum of 2 years of related experience with a bachelor's degree; or equivalent work experience.
- Strong written, oral, and interpersonal communications skills.
- AWS or Azure cloud certification is preferred.
- Preferred experience using: Jira, Prometheus, Grafana, ELK, Site24x7. Nagios a bonus!
R365 Team Member Benefits & Compensation
- This position has a salary range of $100K-$130K. The above range represents the expected salary range for this position. The actual salary may vary based upon several factors, including, but not limited to, relevant skills/experience, time in the role, business line, and geographic location. Restaurant365 focuses on equitable pay for our team and aims for transparency with our pay practices.
- Comprehensive medical benefits, 100% paid for employee
- 401k + matching
- Equity Option Grant
- Unlimited PTO + Company holidays
- Wellness initiatives
#BI-Remote
Job Profile
Benefits/Perks24x7 support rotation 401k + matching Bonus Comprehensive medical benefits Empowering culture Equitable Pay Equity Option Grant R365 Team Member Benefits & Compensation Skill development Team member benefits Team Member Benefits & Compensation Transparency Unlimited PTO Unlimited PTO + Company holidays Wellness Initiatives
Tasks- Automate processes
- Collaborate with DevOps
- Enhance cloud services
- Implement security policies
- Maintain documentation
- Other duties as assigned
- Respond to production incidents
- Support infrastructure
Accounting Ansible Apache Tomcat Automation AWS Azure Back-office operations Bash Change Management CloudFormation Cloud Services Containers Continuous Deployment DevOps Documentation Git GitLab Incident Response Infrastructure as Code Interpersonal Jira Linux Monitoring tools Nginx Perl PHP PowerShell Python Restaurant industry Root Cause Analysis SaaS Scripting Site Reliability Engineering Stash Technical Aptitude Terraform Troubleshooting Windows YAML
Experience2 years
EducationAccounting Bachelor's Bachelor's degree Business Engineering Equivalent work experience Related Experience
Certifications