Senior Site Reliability Engineer, Cloud Platform (Remote, US)

New York City

Full Time Senior-level / Expert
Paperspace logo
Paperspace
Cloud Machine Learning, AI, and effortless GPU infrastructure
Apply now Apply later

About Paperspace
Paperspace builds tools and infrastructure to make accelerated computing simple and accessible.
Paperspace is backed by leading investors including Y Combinator, Initialized Capital, Battery Ventures, and Intel Capital.
The Role
The Cloud Platform team is responsible for the infrastructure and services that make up Paperspace’s cloud. The team manages our hardware infrastructure, internal networks, and user-facing products such as shared file system service, object store, managed Kubernetes, VPCs, and compute. The team’s goal is to provide a reliable and scalable platform so that both users and Paperspace can run their applications easily and reliably.
What we're looking for
• Strong interest in development platforms, MLOps, CI/CD, infrastructure, or making products for technical teams• 6+ years relevant industry experience in a fast-paced, high growth tech environment building and scaling infrastructure using engineering practices with languages such as Python or Go• Experience with systems, linux OS, networking, storage, monitoring, and alerting
What you'll be doing
• Work with Python and Go• Proactively address reliability, scalability, and security concerns by adding alerts, monitoring, and new processes with high autonomy• Use Ansible for configuration management of bare-metal and VM hosts• Managing Kubernetes clusters that host internal services• Manage our network and storage infrastructure using automation• Participate in an on-call rotation• Collaborate with other engineers to find elegant architectures and solutions
Technical problems the team has worked on
• Implemented prometheus-compatible metrics store for monitoring and alerting• Implemented CI/CD to manage software deployments to Kubernetes• Managed and setup monitoring BGP inconsistencies with providers• Created a Ceph cluster to provide posix-compliant shared file systems and object stores for users• Supported dynamic internal network provisioning that power user VPCs and Gradient VPCsOur Team 
Paperspace values technical excellence in an open and inclusive environment. The team is primarily based in NYC, but we have a strong remote/hybrid team. Communication is paramount and mutual respect is at the core of our collaborative work environment. We are also committed to building a team that represents a variety of backgrounds, perspectives, and skills. We believe creating a more diverse team directly impacts our ability to collaborate effectively, build a better community, and produce better products.
Benefits
• Multiple health care insurance options with premium plans in addition to vision and dental insurance plans• 401(k) Plan with employer matching• Commuter benefits with a contribution from the company • Responsible Time Off Policy • Generous and flexible parental leave• Fitness & wellness benefit• Remote friendly and hybrid office environment for New York team members
We are an equal opportunity employer that values and welcomes diversity. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.
#LI-Remote
Job region(s): North America
Job stats:  0  0  0
  • Share this job via
  • or

Explore more Remote Work and WFH career opportunities