Staff SRE Engineer
North America - Remote
About Invisible
Invisible Technologies is the AI training and scaling partner for the leading foundation model providers, enterprises, and governments, bridging the gap between AI potential and production. Invisibleâs unique AI Process Platform combines elite global human expertise, cutting edge technology, and deep institutional knowledge gained by training 80% of the worldâs leading AI models. Trusted by AWS, Microsoft, and Cohere, we have an unparalleled ability to operationalize AI for real-world applications. Our explosive growth landed us the #3 spot on the Inc. 5000 in 2024, closing the year on $134m revenue.
About The Role
We are always striving to build the right thing. You are a key partner for the Engineering and Product teams. You will focus your energy on driving reliability and automation for our products. The ideal candidate has learned from experience that technical decisions have far-reaching consequences. As an experienced professional engineer, you are always mindful to avoid technical debt and waste.
Â
What Youâll Do
- Ensure the availability, performance, and scalability of production systems
- Deploy, configure, automate, and manage cloud-based infrastructure using tools like Kubernetes, Terraform, and Argo
- Identify and resolve system bottlenecks, optimizing for performance and cost efficiency across engineering teams
- Design, support, and manage deployment pipelines to enable world class delivery of applications
- Design, develop, and maintain comprehensive monitoring and observability systems using Datadog and Sentry
- Define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure reliability and performance
- Design and implement automated solutions to reduce manual operational tasks
- Build tools for system provisioning, monitoring, deployment, and scalingÂ
- Collaborate closely within engineering teams to improve application reliability, resilience, and maturityÂ
What We Need
- Strong understanding of cloud architecture including expertise with major cloud providers (GCP, AWS, Azure)
- Proficiency in a programming language and ability to write production code beyond just scripting
- Understand underlying networking and security considerations when developing the architecture of our deployment environments
- Strong understanding of Relational Databases (PostgreSQL) and be comfortable optimizing and advising the broader engineering team on optimization techniques to ensure the data layer of our deployed services run smoothly
- Strong understanding of authentication and authorization principles such as IAM, Security Groups, RBAC, etc.
- Understanding of software engineering fundamentals, practices, and patterns with distributed cloud services
- Strong experience with production systems troubleshooting âŚ
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Location-based eligibility requirements Remote
Benefits/PerksAnnual profit reinvestment Co-ownership Exceptional benefits Liquidity for shares Partner ownership Remote-first company Remote work Transparent pay model
Tasks- Automate operational tasks
- Collaborate with engineering teams
- Define SLOs and SLIs
- Deploy and manage cloud infrastructure
- Design deployment pipelines
- Ensure system availability
- Maintain monitoring systems
- Optimize system performance
AI AI models AI Training Argo Authentication Automation AWS Azure Cloud Architecture CloudFormation Datadog Design Distributed cloud services Engineering GCP IAM Infrastructure Infrastructure as Code Kubernetes Monitoring Networking Observability Optimization PostgreSQL Programming RBAC Recruiting Recruitment Relational databases Reliability Security Sentry Software Engineering SRE Talent Acquisition Technology Terraform Training
Experience5 years
Education