Lead Machine Learning Infrastructure Engineer
Remote
A bit about Cantina:
Cantina, founded by Sean Parker, is a new social platform with the most advanced AI character creator. Build, share, and interact with AI bots and your friends directly in the Cantina or across the internet.
Cantina bots are lifelike, social creatures, capable of interacting wherever humans go on the internet. Recreate yourself using powerful AI, imagine someone new, or choose from thousands of existing characters. Bots are a new media type that offer a way for creators to share infinitely scalable and personalized content experiences combined with seamless group chat across voice, video, and text.
If you're excited about the potential AI has to shape human creativity and social interactions, join us in building the future!
A bit about the role:
We are seeking a Tech Lead to guide the development of our machine learning infrastructure team. This role will be critical in scaling our AI systems, which underpin the creation and deployment of highly interactive, multimodal AI characters. You’ll lead the architecture and implementation of robust ML pipelines while managing the infrastructure needed to support real-time interactions across various platforms.
What you’ll do:
Lead the design, development, and maintenance of scalable machine learning infrastructure for Cantina’s AI-driven applications.
Implement and optimize the deployment of ML models, ensuring low-latency, high-availability performance.
Collaborate cross-functionally with product, engineering, and research teams to integrate AI models into our platform.
Develop robust monitoring and feedback loops to ensure continuous model improvement based on real-world data and user interactions.
Spearhead initiatives to optimize infrastructure for cost, efficiency and scalability.
Ensure the machine learning infrastructure meets best practices in security and- reliability.
A bit about you:
5+ years of experience working with machine learning infrastructure in a production environment, preferentially for a consumer facing product.
2+ years of management experience preferred.
Proven experience leading teams in building scalable ML systems and pipelines.
Expertise with cloud platforms (e.g. AWS) and container orchestration tools (e.g., Docker, Kubernetes).
Strong programming skills, with proficiency in Python and experience with ML frameworks such as TensorFlow or PyTorch.
Experience with monitoring and managing deployed models, using tools like A/B testing, telemetry, or model performance tracking.
Excellent communication skills to work with both technical and non-technical stakeholders.
Passion for AI and enthusiasm for its applications in creative and social contexts.
Pay Equity:
In compliance with Pay Transparency Laws, the base salary range for this role is between $200,000-250,000 for those located in the San Francisco Bay Area, New York City and Seattle, WA. When determining compensation, a number of factors will be considered, including skills, experience, job scope, location, and competitive compensation market data.
Benefits Summary:
Health Care — 99% of premiums for medical, vision, dental are fully paid for by Cantina, plus One Medical membership.
Monthly Stipend — $500/month to use on whatever you’d like!
Rest and Recharge — 15 PTO days per year, 9 sick days, 13 paid company holidays, and offices closed for winter break (Christmas Eve to New Years Day)!
401(K) — Eligible to participate on day one of employment.
Parental Leave & Fertility Support
Competitive Salary & Equity
Lunch and snacks provided for in-office employees.
WFH equipment provided for full-time hybrid/remote employees.
Job Profile
Benefits/Perks401k participation from day one Equity Fertility support Health Care Health care premiums covered Lunch and snacks Lunch and snacks provided Monthly stipend Parental leave Parental leave and fertility support PTO PTO and sick days Sick Days WFH equipment WFH equipment provided Winter break
Tasks- Collaborate with cross functional teams
- Develop monitoring and feedback loops
- Implement and optimize ML model deployment
- Lead design and development of ML infrastructure
- Optimize infrastructure for cost and efficiency
A/B Testing AI AWS Cloud Cloud platforms Docker Go Kubernetes Machine Learning Model Performance Tracking Python PyTorch Telemetry TensorFlow
Experience5 years
Timezones