Senior Backend Engineer - Adaptive Telemetry (Remote, USA)
United States (Remote)
This is a remote position. We are looking for candidates in the Eastern timezone of the USA
What is Grafana Cloud?
Grafana Cloud is our composable observability platform that integrates metrics, logs, traces, and profiles with Grafana. It allows our customers to leverage the best open source observability software – including Prometheus, Mimir, Loki, Tempo, and Pyroscope – without the overhead of installing, maintaining and scaling their own observability stack.
The Databases department owns and operates the telemetry databases that are Mimir for metrics, Loki for logs, Tempo for traces, and Pyroscope for profiles. We offer our databases as a Cloud service supporting Grafana Cloud, and additionally as on-premise solutions with Grafana Enterprise Metrics, Grafana Enterprise Logs, and Grafana Enterprise Traces. They are multi-tenant distributed systems implemented in Go and running on Kubernetes across all major Cloud service providers (GCP, Azure, AWS).
Adaptive Telemetry Group
The Adaptive Telemetry group, part of the Databases department, has the mission of ensuring that all telemetry stored in our databases is worthy of attention. The group has already developed Adaptive Metrics and Adaptive Logs and is now in the path of iterating them while expanding the same vision to other telemetry signals These services operate at a large scale and performance is key to keeping the offering competitive and running smoothly.
An Adaptive Telemetry engineer has various work streams. They likely are engaged in a larger project with another engineer, as well as mixing in some performance and reliability improvements discovered through operating the system in production. They are also responsible for writing and reviewing PRs and design documents from other engineers in the squad, shepherding automated release rollouts, and participating in the on-call rotation for their systems.
As a company we are remote-first and global, we embrace people of different experiences and backgrounds to build diverse teams where every person brings a new perspective to the software. Our tech stack is mostly made up of services written in Go, running on multiple Kubernetes clusters that leverage Cloud object storage.
What will you be doing?
- Take an active role in influencing our roadmap and your own career objectives.
- Work with your team to deliver new features, then use the results to iterate and improve.
- Drive projects from initial ideation all the way to operations once it is in the hands of customers.
- Contribute to other projects that may not directly fall within your team’s scope.
- Design, build, operate, and maintain critical systems, owning the reliability, performance, and availability.
- Be a part of your team’s follow-the-sun on-call rotations and take ownership of the services you’re running.
- Mentor and support other team members, participate in design discussions and collaborate with the team.
- Learn new skills by gaining a deeper understanding of our cloud product and our customers and getting to know the codebase of a large distributed system.
As we are remote-first and our engineering organization is largely remote, we provide guidance and meet regularly using video calls, so an independent attitude and good communication skills are a must.
What are we looking for in you?
You are a motivated self starter with a bias towards action. You are customer focused. We build everything with our users in mind. You have a passion for creating intuitive products that fit customers’ needs
- Pragmatism: You are able to take on complex challenges and break them down to achieve short feedback loops: to analyze, design, and build modular solutions, deliver MVPs, gather data and feedback and then progress iteratively
- Collaboration and communication: The smallest unit we have is a squad. You’ll be working with your teammates in a fully remote setup. Good communication skills are a must
- Solid experience with at least one programming language. We use Go, but if you have familiarity with Python, C, C++, Rust or similar then that translates well
- Some experience with delivering projects from gathering requirements, brainstorming ideas all the way to shipping a product to the customer’s hands in a self-driven way
- Some experience with developing software that runs in the Cloud or some experience with systems engineering
- Some experience working with microservices architectures and distributed systems.
- Experience writing clean, robust, and performant software that is easily maintained by others
Nice to haves:
- Experience working with Kubernetes
- Experience working with Kafka
- Been a user of Grafana and Prometheus in operational roles (including on-call for your team at a previous employer or just using these tools on hobby/homelab projects)
- Familiarity with being on-call and performing operations/SRE tasks or with the concept of infrastructure as code
In the USA, the Base compensation range for this role is $148,000 - $178,000. Actual compensation may vary based on level, experience, and skillset as assessed in the interview process. Benefits include equity, bonus (if applicable) and other benefits listed here.
*Compensation ranges are country-specific. If you are applying for this role from a different location than listed above, your recruiter will discuss your specific market’s defined pay range & benefits at the beginning of the process.
About Grafana Labs: There are more than 20M users of Grafana, the open source visualization tool, around the globe, monitoring everything from beehives to climate change in the Alps. The instantly recognizable dashboards have been spotted everywhere from a NASA launch and Minecraft HQ to Wimbledon and the Tour de France. Grafana Labs also helps more than 3,000 companies -- including Bloomberg, JPMorgan Chase, and eBay -- manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack, both featuring scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo). Benefits: For more information about the perks and benefits of working at Grafana, please check out our careers page. Equal Opportunity Employer: At Grafana Labs we’re building a company where a diverse mix of talented people want to come, stay, and do their best work. We know that our company runs on the hard work and the dedication of our passionate and creative employees. If you're excited about this role but your experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. We will recruit, train, compensate and promote regardless of race, religion, color, national origin, gender, disability, age, veteran status, and all the other fascinating characteristics that make us different and unique. We believe that equality and diversity builds a strong organization and we’re working hard to make sure that’s the foundation of our organization as we grow. For information about how your personal data is used once you’ve applied to a job, check out our privacy policy. ApplyJob Profile
Candidates must be in the eastern timezone of the USA Fully remote Remote Remote-first Remote position
Benefits/PerksBonus Career development opportunities Diverse teams Diversity Equal opportunity employer Equity Fully remote Independent attitude Other benefits Other benefits listed Remote-first company Remote-first culture
Tasks- Deliver new features
- Design and maintain systems
- Drive projects
- Feedback
- Influence roadmap
- Iterate and improve
- Mentor team members
- On-call rotations
- Participate in design discussions
- Participate in on-call rotations
- Provide guidance
AWS Azure C Cloud Cloud Services Collaboration Communication Dashboards Databases Design Distributed Systems Engineering GCP Go Grafana Grafana Cloud Grafana Loki Grafana Mimir Grafana Tempo Infrastructure as Code Kubernetes Logs Loki Metrics Microservices Mimir Monitoring Observability Open Source Operations Performance Optimization Prometheus Pyroscope Python Reliability Engineering Rust Scaling SRE Support Systems Engineering Tech Telemetry Tempo Traces Visualization
Education TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9