Senior Site Reliability Engineer (REMOTE)
Irvine, CA
Senior Site Reliability Engineer (REMOTE)
Overview:
As a Senior Site Reliability Engineer at Weedmaps you will work cross-departmentally with your partners on the application, infrastructure and quality teams to enhance the performance, reliability, resilience and scalability of the web services that make up Weedmaps.com. We are a cloud native organization with 100% of our services in Docker running on Kubernetes in AWS’ public cloud.. We also leverage observability, monitoring, CI/CD automation and custom tooling to push multiple production releases a day.
Your day to day focus will be leveraging your engineering skills to assist in building, monitoring, reducing developer toil, configuring CI workflows and improving our deployment pipelines. You will also be a knowledge reference for our development teams to ensure they are leveraging consistent tooling for metrics, logging, build, and deployment. You will work closely with the development and infrastructure teams to identify the essential service-specific metrics (beyond the golden metrics) that need to be monitored and work with application development teams to create libraries to allow services to easily instrument their services.
The impact you'll make:
- Collaborate with stakeholders to drive best practices for monitoring, CI/CD pipelines
- Troubleshoot deployment issues in our CI/CD pipeline
- Advocate emphatically for the DevOps culture here at Weedmaps
- Identify areas for automation and embrace the codification of all things
- Evangelize best practices around collaboration, reliability, security and performance to all partner teams
- Take ownership of the application configuration/scaling for given services to ensure that they are following the established practices of the organization
- Create and refine synthetic monitoring flows
- Help teams understand the reliability of their services using metrics and observability.
What you've accomplished:
- Minimum 5 years of experience at startup/mid-sized companies
- Proficiency in at least one, Python, Go, Node, Ruby or Elixir
- Experience using/operating Kubernetes in a production environment.
- Effective communication skills, a positive attitude, and ability to give and receive constructive feedback
- Ability to learn fast and be adaptable to environments and change.
- Strong bias for action and strong decision-making capabilities.
- Must be capable of self-managing. Prioritization and time management are an absolute must.
- Professional experience with cloud native observability standard such as Open metrics, Open tracing and Open Census
- Expertise using/configuring modern CI/CD workflows
- Intimate understanding and experience implementing of SLIs, SLOs and SLAs …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Accident Insurance Company holidays Critical illness insurance Dental Disability Insurance Employee Training FSA Generous PTO HSA contribution Identity theft protection Medical Mental health benefits Paid parental leave Paid Sick Leave Pet Insurance Student Loan Repayment Vision
Tasks- Automate processes
- Collaborate on best practices
Automation AWS CI/CD CloudWatch Datadog DevOps Docker Elixir Engineering GCP GitHub Go Grafana HashiCorp Infrastructure as Code Kubernetes Legal Monitoring Node Observability Product Prometheus Python Ruby Site Reliability Engineering Terraform
Experience5 years
Education TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9