FreshRemote.Work

Staff Site Reliability Engineer

Remote

About Dutchie

Founded in 2017, Dutchie is a comprehensive technology platform powering dispensary operations, while providing consumers with safe and easy access to cannabis. Dutchie aims to further support the positive societal change the cannabis industry brings to the world through wellness benefits, social justice, and empowering local communities through tax revenue. Powering thousands of dispensaries across 40+ markets throughout the United States and Canada, Dutchie is the leading technology company in the cannabis space and was named in Fast Company’s 10 Most Innovative Companies in North America and listed two years in a row on LinkedIn’s Top 50 Startups.

Dutchie has raised over $600M in funding to date, backed by D1 Capital Partners, Tiger Global, Dragoneer, DFJ Growth, Thrive Capital, Howard Schultz, Snoop Dogg’s Casa Verde Capital, Gron Ventures, members of the founding team at DoorDash, Kevin Durant’s Thirty Five Ventures, and other notable angel investors.

About This Job

We are seeking an experienced and talented Staff Site Reliability Engineer to join our Reliability Engineering team. As a Staff SRE, you will play a key role in ensuring the reliability, scalability, and performance of our infrastructure and applications. You will work closely with cross-functional teams to design, build, and maintain systems that deliver exceptional user experiences and improve the uptime and availability of the company’s products and services.

What You'll Do...

  • Collaborate with development and operations teams to identify and implement solutions for improving system reliability, performance, and availability.
  • Design and implement automation strategies for provisioning, configuration, and monitoring of infrastructure and applications.
  • Lead incident response efforts, ensuring timely and effective resolution of issues and conducting thorough post-mortems for continuous improvement.
  • Utilize tools such as Datadog for observability and Splunk for logging to enhance monitoring, alerting, and logging capabilities.
  • Enable application teams across the company to better instrument and improve observability of their services while also enhancing overall system reliability.
  • Conduct regular performance analysis and capacity planning to proactively address potential issues and optimize system performance.
  • Implement and manage monitoring, alerting, and logging systems to ensure the early detection of issues.
  • Contribute to the design and implementation of disaster recovery and business continuity plans.
  • Stay current with industry trends, emerging technologies, and best practices to continually enhance the reliability and efficiency of our systems.
  • Troubleshoot and resolve complex issues in production environments.
  • Participate in on-call rotation to ensure 24/7 availability of our systems and services.
  • Lead and mentor junior …
This job isn't fresh anymore!
Search Fresh Jobs

Job Profile

Benefits/Perks

Equity packages Medical benefits

Tasks
  • Implement process improvements
  • Lead incident response efforts
  • Participate in on-call rotation
Skills

Agile AWS Azure GCP Go Infrastructure as Code Kubernetes Monitoring Networking Python Security Shell

Experience

8+ years

Education

Bachelor's degree in Computer Science Computer Science Information Technology Related Field