Data Reliability Engineer
Hybrid / Bungie-Approved Remote Locations
Data Reliability Engineering at Bungie is a core team of the Central Tech area that keeps our games and tooling running at scale. Our team owns the overall scalability, observability and resilience of the databases, data processing platforms and in-memory key-value stores used throughout the Bungie ecosystem. We partner with our engineering teams and business units on projects, services, designs, and processes We are the stewards of architecture and provide tools and services to enable engineering teams to meet their design requirements.
RESPONSIBILITIES
- Design, deploy, and maintain highly available and scalable data infrastructure components including Kafka, RabbitMQ, Redis, Elasticsearch, and Graphite
- Perform capacity planning and scalability assessments for data platforms
- Troubleshoot and resolve issues related to data processing pipelines, message queuing, and performance including participation in on-call rotation
- Ensure data security, integrity, and compliance with industry best practices and regulatory requirements
- Document system configurations, procedures, and operational knowledge
- Advise service owners on industry and company standards and best practices
- Maintain reliability and performance levels for core data platform infrastructure
- Data observability strategy and implementation
- Data ownership strategy and documentation
REQUIRED SKILLS
- Strong understanding of Linux operating systems and their administration
- Effective communication skills and ability to collaborate effectively in a team environment
- Experience with infrastructure automation and configuration management (e.g., Ansible, Terraform…)
- Excellent troubleshooting skills and the ability to analyze and resolve complex infrastructure resource and application deployment issues
- Experience working in a distributed production environment
- Deep understanding of cluster management areas, such as scaling, consistency tuning, replication, and multi-datacenter configuration
- Familiarity with time-series monitoring systems & tools (e.g., Datadog, Prometheus, Grafana and ELK)
- Experience designing and implementing logging and metric pipelines
This range is determined by an array of factors, including training, transferable skills, work experience, business needs, and market demands. Additionally, it is subject to change and may be modified in the future.
In the journey to make incredible worlds, Bungie employees don't simply do meaningful work - they receive meaningful support. Competitive salaries and discretionary bonus opportunities are just the start. We offer comprehensive healthcare coverage, generous …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
RestrictionsBungie-approved remote locations Hybrid
Benefits/Perks401(k) matching Competitive salaries Comprehensive healthcare Comprehensive healthcare coverage Discretionary bonuses Discretionary bonus opportunities Flexible time off Flexible time off policies Healthcare coverage Hybrid work Paid parental leave Social clubs
SkillsAnsible Cluster Management Communication Configuration Management Datadog Documentation ElasticSearch ELK Grafana Infrastructure Automation Kafka Linux Observability Prometheus RabbitMQ Redis Resilience Scalability Terraform Troubleshooting