Site Reliability Engineer (SRE)
Remote
At Invisible AI, we are building the future of computer vision. Today, our core focus is on developing an end-to-end platform that can digitize manufacturing operations. We deploy edge AI cameras to digitize all steps of manual assembly work which helps people-driven manufacturing be accurate, reliable, and safe. Coming from the world of self-driving cars, the founders of Invisible AI have years of experience in building and deploying large-scale AI & Machine Learning pipelines. Join us and help build a company that will deliver the endless possibilities of computer vision to real-world customers!
As a Site Reliability Engineer, you will build the technology to enable our platform to deploy, run, and monitor Invisible AI’s software at scale across tens of independent deployments and thousands of devices. The SRE works closely with all other engineering teams and owns internal tools to enable faster development and deployment, like secure ephemeral debug environments, streamlined access controls, CI/CD systems, and a custom in-house device management platform for device configuration and software releases.
As a Site Reliability Engineer, you will build the technology to enable our platform to deploy, run, and monitor Invisible AI’s software at scale across tens of independent deployments and thousands of devices. The SRE works closely with all other engineering teams and owns internal tools to enable faster development and deployment, like secure ephemeral debug environments, streamlined access controls, CI/CD systems, and a custom in-house device management platform for device configuration and software releases.
Responsibilities:
- Design, build, and maintain scalable and resilient infrastructure on the edge.
- Develop automation and infrastructure-as-code solutions using Terraform, Ansible, and scripting languages (Python, Bash).
- Deploy and manage containerized applications using Docker and related technologies.
- Ensure system observability by building and optimizing monitoring systems, particularly using Prometheus.
- Troubleshoot and optimize Linux-based systems (e.g., Red Hat, CentOS, Ubuntu).
- Collaborate with security teams to implement robust security practices and ensure compliance with best practices.
- Work closely with software engineers to improve system performance, reliability, and deployment pipelines.
- Support and maintain networking infrastructure, including troubleshooting protocols and configurations.
- Manage cloud and on-premise infrastructure, with a focus on automation and scalability.
- Contribute to incident response, postmortems, and process improvements.
Requirements:
- 5+ years of experience building and managing infrastructure at scale, particularly on the edge.
- Proficiency in Python, Docker, Linux systems, and scripting (Bash, Python).Strong expertise with infrastructure automation tools (Terraform, Ansible).Experience managing observability and monitoring systems, particularly Prometheus.
- Deep understanding of networking concepts and protocols.
- Familiarity with cloud platforms (AWS, Azure, Google Cloud) is a plus.
- Experience with Windows Services/VMs is a plus.
- Excellent problem-solving skills, with attention to detail.
- Strong communication and collaboration skills to work across teams.
- Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent experience.
Job Profile
RestrictionsRemote
Benefits/PerksEquity Sales incentive pay Total compensation package
Tasks- Build and maintain infrastructure
- Collaborate with Security teams
- Deploy containerized applications
- Develop automation solutions
- Ensure system observability
- Manage cloud and on-premise infrastructure
- Troubleshoot Linux systems
AI Ansible AWS Azure Bash Collaboration Communication Computer Vision Docker Google Cloud Infrastructure as Code Linux Machine Learning Manufacturing Networking Problem-solving Prometheus Python Sales Site Reliability Engineering Terraform
Experience5 years
EducationAutomation Bachelor's degree Computer Science Engineering Information Technology Related Field Technology
Remote Jobs in North America
Remote Jobs in Europe
Remote Jobs in South America
Remote Jobs in Asia/Pacific
Remote Jobs in Middle East
Remote Jobs in Africa
Full Time Remote Jobs
Part Time Remote Jobs
Internship Remote Jobs
Contract Remote Jobs
Temporary Remote Jobs
Freelance Remote Jobs
Mid-Level Remote Jobs
Senior-Level Remote Jobs
Entry-Level Remote Jobs
Exec-Level Remote Jobs
Lead-Level Remote Jobs
Junior-Level Remote Jobs
Remote Senior Manager Jobs
Remote Contract Jobs
Remote Assistant Jobs
Remote Project Manager Jobs
Remote Platform Jobs
Remote Analytics Jobs
Remote Writer Jobs
Remote Claims Jobs
Remote Designer Jobs
Remote Hybrid Jobs
Remote Senior Software Engineer Jobs
Remote Design Jobs
Remote Sales Specialist Jobs
Remote Professional Jobs
Remote Program Manager Jobs
Remote Engineer I Jobs
Remote Spanish Jobs
Remote Customer Service Jobs
Remote Quality Jobs
Remote Data Scientist Jobs
Remote Jobs with PHP > 220K in Salary
Remote Jobs with CAD > 140K in Salary
Remote Jobs with EUR > 120K in Salary
Remote Jobs with GBP > 120K in Salary
Remote Jobs with CAD > 160K in Salary
Remote Jobs with EUR > 140K in Salary
Remote Jobs with GBP > 140K in Salary
Remote Jobs with PLN > 80K in Salary
Remote Jobs with PLN > 40K in Salary
Remote Jobs with CAD > 180K in Salary
Remote Jobs with PLN > 100K in Salary
Remote Jobs with PLN > 60K in Salary
Remote Jobs with EUR > 160K in Salary
Remote Jobs with PLN > 120K in Salary
Remote Jobs with PLN > 140K in Salary
Remote Jobs with PLN > 160K in Salary
Remote Jobs with EUR > 180K in Salary
Remote Jobs with GBP > 160K in Salary
Remote Jobs with PLN > 180K in Salary
Remote Jobs with PLN > 200K in Salary