Site Reliability Engineer

Remote, USA

Doxel

Published 1 month ago

Hey, this job isn't fresh anymore! 👉 Find fresh remote jobs here

Doxel AI is hiring a Site Reliability Engineer to join the Engineering team to focus on the robustness and performance of our systems and processes. This role will accelerate our mission to ensure every decision on a construction site is a great one.
Construction is the 2nd largest industry in the world (4x the size of SaaS!). But unlike software (with observability platforms such as AppDynamics and Datadog), construction teams lack automated feedback loops to help projects stay on schedule and on budget. Without this observability, construction wastes a whopping $3T per year because glitches aren’t detected fast enough to recover.
Doxel AI exists to bring computer vision to construction, so the industry can deliver what society needs to thrive. From hospitals to data centers, from foreman to VPs of construction, teams use Doxel to make better decisions everyday. In fact, Doxel has contributed to the construction of the facilities that provide many of the products and services you use everyday.
We have classic computer vision, deep learning ML object detection, a low-latency 3D three.js web app, a complex data pipeline powering it all in the background. We’re building out new workflows, analytics dashboards, and forecasting engines. Join us in bringing AI to construction!
The Role
Doxel Engineers produce the foundation for Doxel's construction insights including the behind the scenes technology that snapshots hundreds of thousands of square feet of construction activity per day and the software that ingests 100s of gigabytes of data per site per day, as well as the state-of-the-art web application that renders this data in a useful, performant manner for our customers.
We are looking for our first Site Reliability Engineer (SRE) to bring a holistic approach to the reliability, availability, and performance of our systems, reducing the frequency of service disruptions and increasing the customer’s trust and delight in our product. You care deeply about reliable, low-upkeep software. You bring curiosity and a can-do attitude to work, eager to dive into ambiguous problems and bring order and reliability to our systems and processes so that they can scale far beyond our current use cases.

What You'll Do

System Monitoring: Setting up and maintaining monitoring systems to track performance metrics and uptime.
Incident Management: Optimizing our incident response processes, performing root cause analysis, and ensuring quick recovery from outages.
Infrastructure Management: Designing and maintaining scalable and reliable infrastructure, including servers, databases, networks, and …