Senior Backend Engineer - Grafana Ops, Alerting (Remote, US/Canada)
Canada (Remote)
This is a remote position. We are looking for candidates in US and Canada only.
What we do
The Grafana Alerting squad operates at the core of the Grafana open-source project (link), and our mission is to provide the most in-depth way to let users manage their alerts. The backend-focused team works closely with customers and the Grafana Cloud teams to make Alerting work on-prem and at infinite scale in Cloud.
Our domain is quite large, so to get an idea of the kind of work we do, here are some projects we recently worked on:
- Build distributed systems to scale alert ingestion and delivery of Prometheus-based alerts
- Unify Alerting setup and delivery to work the same for Grafana and Prometheus-based alerts
- Build distributed systems to scale alert evaluation of Grafana-managed alerts, reaching over 500 evaluations per second in production.
Grafana Alerting is trusted by major organizations worldwide, monitoring essential medical devices and critical infrastructure. We thrive on collective creativity and diverse perspectives, every team member is encouraged to contribute ideas that shape our product into a dependable tool.
What will you be doing?
- Take an active role in influencing our roadmap and your own career objectives
- Work with your team to deliver new features, then use the results to iterate and improve.
- Drive projects from initial ideation all the way to operations once it is in the hands of customers
- Embrace our open-source culture and contribute to other projects that may not directly fall within your team’s scope
- Design, build, operate, and maintain critical systems, owning the reliability, performance, and availability
- Be a part of your team’s on-call rotations and take ownership of the services you’re running
- Mentor and support other team members, participate in design discussions and collaborate with the team
- Learn new skills by gaining a deeper understanding of our cloud product and our customers and getting to know the codebase of a large distributed system
As we are remote-first and our engineering organization is largely remote, we provide guidance and meet regularly using video calls, so an independent attitude and good communication skills are a must.
What are we looking for in you?
- You are a motivated self starter with a bias towards action
- You are customer focused
- We …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Fully remote Remote Remote position
Benefits/PerksBonus Collaborative culture Diversity Equity Fully remote Independent attitude Other benefits Other benefits listed Remote-first company Remote work Skill development
Tasks- Deliver new features
- Drive projects
- Feedback
- Mentor team members
- On-call rotations
- Participate in design discussions
- Provide guidance
Alerting C C++ Cloud Collaboration Communication Dashboards Design Distributed Systems Engineering Evaluation Go Grafana Grafana Cloud Infrastructure as Code Java Kubernetes Logs Loki Metrics Microservices Microservices architecture Mimir Monitoring Observability Open Source Operations Prometheus Python Rust SRE Support Systems Engineering Tempo Traces Visualization
Experience5 years
Education TimezonesAmerica/Edmonton America/Moncton America/Regina America/St_Johns America/Toronto America/Vancouver UTC-3 UTC-4 UTC-5 UTC-6 UTC-7 UTC-8