FreshRemote.Work

Staff Site Reliability Engineer

Remote

About Dutchie

Founded in 2017, Dutchie is a comprehensive technology platform powering dispensary operations, while providing consumers with safe and easy access to cannabis. Dutchie aims to further support the positive societal change the cannabis industry brings to the world through wellness benefits, social justice, and empowering local communities through tax revenue. Powering thousands of dispensaries across 40+ markets throughout the United States and Canada, Dutchie is the leading technology company in the cannabis space and was named in Fast Company’s 10 Most Innovative Companies in North America and listed two years in a row on LinkedIn’s Top 50 Startups.

Dutchie has raised over $600M in funding to date, backed by D1 Capital Partners, Tiger Global, Dragoneer, DFJ Growth, Thrive Capital, Howard Schultz, Snoop Dogg’s Casa Verde Capital, Gron Ventures, members of the founding team at DoorDash, Kevin Durant’s Thirty Five Ventures, and other notable angel investors.

About This Job

We are seeking an experienced and talented Staff Site Reliability Engineer to join our Reliability Engineering team. As a Staff SRE, you will play a key role in ensuring the reliability, scalability, and performance of our infrastructure and applications. You will work closely with cross-functional teams to design, build, and maintain systems that deliver exceptional user experiences and improve the uptime and availability of the company’s products and services.

What You'll Do...

  • Collaborate with development and operations teams to identify and implement solutions for improving system reliability, performance, and availability.
  • Design and implement automation strategies for provisioning, configuration, and monitoring of infrastructure and applications.
  • Lead incident response efforts, ensuring timely and effective resolution of issues and conducting thorough post-mortems for continuous improvement.
  • Utilize tools such as Datadog for observability and Splunk for logging to enhance monitoring, alerting, and logging capabilities.
  • Enable application teams across the company to better instrument and improve observability of their services while also enhancing overall system reliability.
  • Conduct regular performance analysis and capacity planning to proactively address potential issues and optimize system performance.
  • Implement and manage monitoring, alerting, and logging systems to ensure the early detection of issues.
  • Contribute to the design and implementation of disaster recovery and business continuity plans.
  • Stay current with industry trends, emerging technologies, and best practices to continually enhance the reliability and efficiency of our systems.
  • Troubleshoot and resolve complex issues in production environments.
  • Participate in on-call rotation to ensure 24/7 availability of our systems and services.
  • Lead and mentor junior members of the Reliability Engineering team.
  • Continuously identify and implement process improvements to increase efficiency and reduce risk.

What You Bring...

  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • 8+ years of experience as a Site Reliability Engineer or a related role.
  • Strong expertise in cloud platforms (e.g., AWS, Azure, GCP) and container orchestration (e.g., Kubernetes).
  • Proficient in scripting and automation using languages such as Python, Shell, or Go.
  • Solid understanding of networking, security, and infrastructure-as-code principles.
  • Experience with observability tools such as Datadog and logging solutions like Splunk.
  • Proven track record of successfully leading incident response efforts and conducting post-mortems.
  • Experience in enabling application teams to enhance observability and reliability of their services.
  • Excellent communication and collaboration skills, with the ability to work effectively in a team environment.
  • Excellent problem-solving and troubleshooting skills.

It's a bonus if you...

  • Master's degree in Computer Science, Computer Engineering, or a related field
  • Experience with containerization technologies (e.g., Docker, Kubernetes)
  • Experience with Infrastructure as Code (IaC) tools (e.g., Pulumi, Terraform, CloudFormation)
  • Experience with agile development methodologies (e.g., Scrum, Kanban)
  • Relevant industry certifications (e.g., CKAD)

You’ll Get…

We are targeting a starting salary of $190,000 based on the intended level for this role. There may be flexibility on individual compensation packages based candidate skill set, experience, qualifications and other position-related factors.

In addition to cash compensation, our total rewards package includes:

  • Full medical benefits including dental and vision plans to ensure you always have the best care.
  • Equity packages in the form of stock options to all employees.
  • Technology (hardware, software, reading materials, etc..) allowance
  • Flexible vacation and sick days

#LI-AH1

At Dutchie, we’re committed to providing an environment of mutual respect where equal employment opportunities are available to all applicants and teammates without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law. Dutchie believes that diversity and inclusion among our teammates is critical to our success, and we seek to recruit, develop and retain the most talented people from a diverse candidate pool.

Apply

Job Profile

Countries

Canada United States

Benefits/Perks

Equity packages Flexible vacation and sick days Full medical benefits Full medical benefits including dental and vision plans Medical benefits

Skills

Agile AWS Azure GCP Go Infrastructure as Code Kubernetes Monitoring Networking Python Security Shell

Tasks
  • Conduct performance analysis and capacity planning
  • Design and implement automation strategies
  • Enhance monitoring, alerting, and logging capabilities
  • Implement disaster recovery and business continuity plans
  • Implement process improvements
  • Improve system reliability, performance, and availability
  • Lead and mentor junior team members
  • Lead incident response efforts
  • Participate in on-call rotation
  • Stay current with industry trends and technologies
  • Troubleshoot and resolve complex issues
Experience

8+ years

Education

Bachelor's degree in Computer Science Computer Science Information Technology Related Field