FreshRemote.Work

Engineer II - Site Reliability (Remote)

USA CA Remote

#WeAreCrowdStrike and our mission is to stop breaches. As a global leader in cybersecurity, our team changed the game. Since our inception, our market leading cloud-native platform has offered unparalleled protection against the most sophisticated cyberattacks. We work on large scale distributed systems, processing over 1 trillion events a day with a petabyte of RAM deployed in our Cassandra clusters - and this traffic is growing daily. We’re looking for people with limitless passion, a relentless focus on innovation and a fanatical commitment to developing and shaping our cybersecurity platform. Consistently recognized as a top workplace, CrowdStrike is committed to cultivating an inclusive, remote-first culture that offers people the autonomy and flexibility to balance the needs of work and life while taking their career to the next level. Interested in working for a company that sets the standard and leads with integrity? Join us on a mission that matters - one team, one fight.

About the Role:

CrowdStrike is looking to hire an Engineer II to the TechOps SRE team that will have a focus on our Commercial Cloud.  We’re looking for a deeply-technical, hands-on engineer, who loves to develop automation and tooling through software to ensure delivery of mission critical solutions and services for large-scale distributed systems.  

What You'll Do:

  • Have expertise with Linux engineering and administration for thousands of bare metal servers and virtual machines

  • Be responsible for all operational aspects of our platform -  Availability, Latency, Throughput, Monitoring, Issue Response (analysis, remediation, deployment) and Capacity Planning with respect to Latency and Throughput

  • Work in a team of highly motivated engineers distributed across the globe

  • On-call rotation with other team members

  • Troubleshoot server hardware issues

  • Use your passion for technology to ensure our platform operates flawlessly 24x7

  • Obsess about learning, and champion the newest technologies & tricks with others, raising the technical IQ of the team. We don’t expect you to know all the technology we use but you will be able to get up to speed on new technology quickly

  •  Have broad exposure to our entire architecture and become one of our experts in our overall process flow

  • Have an intrinsic drive to make things better

  • Bias towards small development projects and the occasional larger projects

  • Have experience with modern monitoring and telemetry stacks (ELK, Prometheus, Grafana, Zabbix)

  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding

  • Ability to lead incident analysis for incidents, champion incident response practices and assist in correlating incidents to systemic problems, and drive towards resolution.

What You'll Need:

  • Bachelor's degree and/or equivalent experience in Computer Scienc

  • A minimum of five years of experience working in a large scale production environment

  • A minimum of two years of experience in software engineering

  • A minimum of two years of experience in one or more of: C++, Java, Python, Go

  • Experience with storage technologies (Examples: SAN, NAS, NFS, Object Storage, FreeNAS, iSCSI)

  • Experience with Infrastructure technologies (Examples: Linux, Windows, VMware, Docker, Kubernetes, etc.)

  • Experience writing technical documentation

  • Configuration management experience with one or more tools such as Puppet, Chef, Ansible

  • Solid understanding of application design, including operational trade-offs of various designs

  • Analytical skills coupled with a strong sense of urgency, ownership, and drive

  • Ability to work with well in a diverse, team-focused environment with other SREs and Engineers

  • Ability to broadly communicate and present recommended conventions defined by the reliability team broadly

#LI-Remote

#LI-MG1

#LI-SF1

#HTF

This role will require the candidate to periodically undergo and pass additional background and fingerprint check(s) consistent with government customer requirements.

Benefits of Working at CrowdStrike:

  • Remote-first culture

  • Market leader in compensation and equity awards

  • Competitive vacation and flexible working arrangements 

  • Comprehensive and inclusive health benefits

  • Physical and mental wellness programs

  • Paid parental leave, including adoption 

  • A variety of professional development and mentorship opportunities

  • Offices with stocked kitchens when you need to fuel innovation and collaboration

We are committed to fostering a culture of belonging where everyone feels seen, heard, valued for who they are and empowered to succeed. Our approach to cultivating a diverse, equitable, and inclusive culture is rooted in listening, learning and collective action. By embracing the diversity of our people, we achieve our best work and fuel innovation - generating the best possible outcomes for our customers and the communities they serve.

CrowdStrike is committed to maintaining an environment of Equal Opportunity and Affirmative Action. If you need reasonable accommodation to access the information provided on this website, please contact Recruiting@crowdstrike.com for further assistance.

CrowdStrike participates in the E-Verify program.

Notice of E-Verify Participation

Right to Work

CrowdStrike, Inc. is committed to fair and equitable compensation practices. The base salary range for this position in the U.S. is $100,000 - $150,000 per year + variable/incentive compensation + equity + benefits. A candidate’s salary is determined by various factors including, but not limited to, relevant work experience, skills, certifications and location.

Apply

Job Profile

Regions

North America

Countries

United States

Benefits/Perks

Autonomy Autonomy and flexibility Competitive vacation and flexible working arrangements Comprehensive and inclusive health benefits Equal Opportunity and Affirmative Action Equity Equity awards Flexibility Flexible working arrangements Health benefits Inclusive culture Inclusive environment Inclusive, remote-first culture Market leader in compensation Mental wellness programs Offices with stocked kitchens Paid parental leave Paid parental leave, including adoption Physical and mental wellness programs Professional development Professional development and mentorship opportunities Remote-first culture Stocked kitchens Wellness programs

Skills

Analytical Ansible Automation C C++ Cassandra Cassandra clusters Chef Cloud-native platform Collaboration COM Configuration Management Cybersecurity Distributed Systems Docker Documentation ELK stack Engineering Go Grafana Incident Response Java Kubernetes Linux Linux administration Linux engineering Operating Systems Prometheus Puppet Python Recruiting Systems VMWare Windows Writing Zabbix

Tasks
  • Documentation
  • Fault-finding
  • Incident analysis
  • Linux engineering
  • Linux engineering and administration
  • Monitoring and telemetry
  • Performance tuning
  • Server hardware troubleshooting
  • Troubleshoot server hardware issues
Experience

5 years

Education

Bachelor's degree Cybersecurity Engineering

Restrictions

Remote Remote-first culture

Timezones

America/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9