Lead Site Reliability Engineer

Remote, USA

AppOmni Remote-first

USD 164K+ Full Time Senior

Company preview All jobs at AppOmni

Search Fresh Jobs Job profile

Published 2 months ago

Hey, this job isn't fresh anymore! 👉 Find fresh remote jobs here

About AppOmni

AppOmni, a leader in SaaS Security, helps customers achieve secure productivity with their applications. Security teams and owners can quickly detect and mitigate threats using unmatched depth of protection, continuous monitoring, and comprehensive visibility. Trusted by over 25% of the Fortune 100, AppOmni specializes in securing diverse SaaS environments.

About the Role

Your work will have a direct and meaningful impact on the integrity and security of our customers’ and their customers’ data - including your own! Our core values are customer experience, quality, and trust. At the end of the day, we succeed when our customers can confidently understand and manage the security and configuration critical to their business.

What You’ll Do

As a Lead Site Reliability Engineer (SRE), you will be on the front lines of our efforts to continuously improve the reliability, scalability, and performance of the AppOmni platform and spearhead our engineering enablement efforts. As a key contributor, you will get to solve interesting challenges, build scalable and secure infrastructure solutions, develop internal tools and frameworks, and ensure high availability of our platform’s services. You will collaborate closely with product experts, software developers, quality engineers, and organizational leaders to deliver robust, automated, and reliable software that delights our customers.

You will be a key figure in identifying system bottlenecks, optimizing processes, and driving a culture of engineering excellence. This role requires a blend of skills across cloud infrastructure, automation, platform development, and top notch operational practices.

Key Responsibilities:

Lead the design, implementation, and maintenance of reliable, scalable platforms to support the development and deployment of cloud-native applications.
Monitor system performance and troubleshoot platform issues.
Optimize alerting, logging, and resource utilization to ensure platform and application reliability.
Develop, maintain, and optimize CI/CD pipelines for rapid and reliable software delivery.
Implement automation frameworks to eliminate manual processes.
Implement and manage infrastructure as code (IaC) to automate infrastructure provisioning and scaling.
Lead capacity planning and platform performance optimization efforts.
Participate in contingency and disaster recovery planning, demand forecasting, and system performance tuning.
Champion best practices to enhance infrastructure agility, resiliency, and security.
Be a part of our on-call rotation and incident management practices.
Managing Kubernetes platforms and resources using deployment tools and patterns such as Helm, Knative, and GitOps.