Senior Manager, Engineering - SRE Network Operations Center (NOC)
MD Chevy Chase (Office) - JPS, United States
GEICO is seeking a dynamic, highly motivated Senior Manager to join our Reliability Engineering organization to oversee our Network Operations Center (NOC), a central point of communication and incident management across our TECH organization. You will be part of a team that facilitates Incident calls, measures, and improves production performance, availability, and reliability through sustainable engineering practices for our mission critical systems. You will work closely with our Product, Platform, Security, and other Infrastructure teams to continuously automate, improve our products’ availability to our customers. You will also be managing a team of NOC engineers with different technology expertise who are passionate about triaging and collaborating with various Product groups across the organization to resolve issues effectively and efficiently.
The Senior Manager, Engineering - SRE Network Operations Center (NOC)
is the cornerstone of the NOC’s transformation into an SRE-centric organization. This role demands a visionary leader who can bridge the gap between traditional NOC practices and modern Incident Response SRE methodologies, ensuring that the NOC operates with maximum efficiency, reliability, and resilience. The Senior Manager is responsible for the strategic direction, leadership, and operational management of the NOC SRE team, ensuring the team delivers on its 24/7/365 mission.
Key Responsibilities:
Strategic Leadership: Define the strategic direction for the NOC with a focus on adopting and embedding SRE practices across all operational processes. This includes promoting a culture of continuous improvement, automation, and reliability engineering.
Team Leadership: Lead a team of 15+ Incident Response SRE engineers, providing guidance, mentorship, and support to ensure high performance. This includes managing performance reviews, professional development, and career growth opportunities for team members.
Incident Management: Serve as the ultimate incident commander during critical incidents, ensuring that the incident response process is handled efficiently and effectively from detection to resolution. This includes overseeing incident communication and ensuring that all stakeholders are informed and aligned.
Operational Oversight: Develop and maintain a robust schedule that ensures 24/7/365 coverage by the NOC SRE team, optimizing shift patterns and staffing levels to meet operational demands.
SRE Transformation: Drive the adoption of SRE practices, working closely with senior leadership to implement changes that reduce toil, enhance reliability, and improve the overall incident response process. This includes spearheading cultural changes within the organization to embrace SRE principles.
Executive Communication: Oversee the creation and delivery of high-quality incident communication reports to executives and key stakeholders, ensuring that all communications are clear, accurate, …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Office presence required
Benefits/PerksCareer growth opportunities Dental Flexible hours Flexible work environment Health and well-being Medical Mentorship opportunities Paid training Paid Training and Licensures Paid Vacation Parental leave Professional development Total Rewards Program Tuition Assistance Vision Vision Insurance
Tasks- Automation
- Communicate with executives
- Documentation
- Oversee incident management
- Problem solving
Agile Agile methodologies Automation AWS Azure Bash Building Cloud Cloud Computing Communication Compliance Continuous Improvement Development Documentation Dynatrace Engineering Engineering Practices Grafana IaaS Incident Management Incident Response Java Kanban Leadership Linux Management Mentorship Monitoring Networking Network Operations Observability Operating Systems Operations Organization Organizational PaaS Performance Perl PowerShell Presentation Prometheus Python Reliability Reliability Engineering SaaS Scripting Scrum Security Splunk SQL SRE Storage Team Leadership Time Management Troubleshooting Windows Written communication
Experience5 years
EducationComputer Science Engineering Information Technology Related Field Work experience
Certifications TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9