Lead Site Reliability Engineer
Remote, USA
About AppOmni
AppOmni, a leader in SaaS Security, helps customers achieve secure productivity with their applications. Security teams and owners can quickly detect and mitigate threats using unmatched depth of protection, continuous monitoring, and comprehensive visibility. Trusted by over 25% of the Fortune 100, AppOmni specializes in securing diverse SaaS environments.
About the Role
Your work will have a direct and meaningful impact on the integrity and security of our customersâ and their customersâ data - including your own! Our core values are customer experience, quality, and trust. At the end of the day, we succeed when our customers can confidently understand and manage the security and configuration critical to their business.
What Youâll Do
As a Lead Site Reliability Engineer (SRE), you will be on the front lines of our efforts to continuously improve the reliability, scalability, and performance of the AppOmni platform and spearhead our engineering enablement efforts. As a key contributor, you will get to solve interesting challenges, build scalable and secure infrastructure solutions, develop internal tools and frameworks, and ensure high availability of our platformâs services. You will collaborate closely with product experts, software developers, quality engineers, and organizational leaders to deliver robust, automated, and reliable software that delights our customers.Â
You will be a key figure in identifying system bottlenecks, optimizing processes, and driving a culture of engineering excellence. This role requires a blend of skills across cloud infrastructure, automation, platform development, and top notch operational practices.
Key Responsibilities:
- Lead the design, implementation, and maintenance of reliable, scalable platforms to support the development and deployment of cloud-native applications.
- Monitor system performance and troubleshoot platform issues.Â
- Optimize alerting, logging, and resource utilization to ensure platform and application reliability.
- Develop, maintain, and optimize CI/CD pipelines for rapid and reliable software delivery.
- Implement automation frameworks to eliminate manual processes.
- Implement and manage infrastructure as code (IaC) to automate infrastructure provisioning and scaling.
- Lead capacity planning and platform performance optimization efforts.
- Participate in contingency and disaster recovery planning, demand forecasting, and system performance tuning.
- Champion best practices to enhance infrastructure agility, resiliency, and security.
- Be a part of our on-call rotation and incident management practices.
- Managing Kubernetes platforms and resources using deployment tools and patterns such as Helm, Knative, and GitOps.
What Weâre Looking For:
- Bachelorâs degree âŚ
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Employee recognition Pay equity Remote-first company
Tasks- Champion best practices
- Develop CI/CD pipelines
- Lead capacity planning
- Monitor system performance
- On-Call Rotation
- Participate in disaster recovery planning
Alerting Ansible Automation AWS Azure Bash CI/CD Cloud Cloud Computing CloudFormation Cloud Infrastructure Collaboration Communication Development DevOps GCP GitHub GitHub Actions GitLab CI Golang Grafana Implementation Incident Management Infrastructure as Code Kubernetes Monitoring Networking Operational practices Platform Development Prometheus Pulumi Python Research SaaS SaaS Security Security Sentry Site Reliability Engineering Terraform
Experience7 years
EducationBachelor's degree Business Computer Science Engineering Related Field Software Engineering
Certifications TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9