FreshRemote.Work

Site Reliability Engineer, Observability - Roseland, NJ / Hybrid or Remote

CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry’s fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80% less expensive than the large, generalized public clouds. Learn more at www.coreweave.com.

About the role:

The Observability Team performs a critical role in enabling CoreWeave to understand, troubleshoot, and optimize complex systems by providing comprehensive insights into their behavior and performance. This team is responsible for the development, integration, and operation of observability platforms with the ultimate objective of enabling engineers across CoreWeave to do more, better. Central to the Observability Teams mission is the operation of our observability stack which leverages CoreWeave’s deep investment in the Kubernetes ecosystem.

We are seeking a Site Reliability Engineer with specialization in the observability stack who can help us execute on the mission of providing a comprehensive logging and metrics ecosystem. Integrating logging, metrics, tracing, and monitoring tools for proactive insights into system performance. This individual will work with a team of 6-8 engineers and have the opportunity to work on the full gamut of rewarding challenges that come with the business of building a cloud in a communicative, supportive, and high-performing environment. As a member of the Observability Team you will have the opportunity to:

  • Design and implement the platform that improves visibility into how the services are performing and operating.
  • Improve the performance, security, reliability, and scalability of our observability, and related services and participate in the teams on-call rotation.
  • Assist engineers in maximizing …

Hey, this job isn't fresh anymore!

Search Fresh Jobs

Job Profile

Countries

United States

Skills

Go Grafana Kubernetes Linux Organization Scripting Shell scripting

Tasks
  • Analyze data for insights
Experience

1 year

Education

Business Engineering

Restrictions

Hybrid or Remote