Senior AI/ML Engineer, Machine Learning Operations
Remote
Onebrief is a revolutionary platform for military staff workflows and operational planning. The software is designed to enable smarter, real-time decisions. With unparalleled collaboration features, AI-enhanced tools, and customizable workflows, Onebrief makes staffs superhuman. The expanding roster of customers includes COCOMs and Service Components worldwide.
Founded in 2017 by a group of experienced planners, today, Onebrief’s workforce of 120+ spans veterans from all forces and global organizations, and technologists from leading-edge software giants. Onebrief’s growth is exemplary, having raised $53M+ and counting from leading venture investors.
What you will achieveAs our second Machine Learning Engineer focused on MLOps, you will play a critical role in establishing and refining our AI/ML deployment pipelines. You will lead MLOps initiatives, enhancing operational efficiencies, and ensuring scalability and reliability in delivering machine learning solutions to production. Your contributions will directly impact the deployment and performance of AI models that optimize military operational planning.
You will design and implement scalable systems for model deployment, monitoring, and maintenance. Working closely with our first ML engineer and cross-functional teams, you will ensure that our AI solutions are robust, efficient, and seamlessly integrated into our platform. Your expertise will help us build a reliable ML infrastructure that supports rapid growth and innovation.
About YouYou are an ambitious individual with hands-on experience in evaluating, fine-tuning, training, and productionizing modern algorithms and open-source machine learning models. With a pragmatic approach and a strong background in applied Machine Learning and AI, you are ready to embrace the challenges and rewards of being a pioneering MLOps engineer in a rapidly growing startup.
You excel in building scalable systems and have a bias towards action. Your technical proficiency is complemented by a design thinking approach to problem-solving, allowing you to iteratively develop solutions that meet user needs and drive innovation.
Core skills: Building Scalable Systems, Python, Systems Architecture, SQL/NoSQL Databases, Git Version Control, Container Orchestration, Model Test and Evaluation.
Qualifications
Educational Background: Bachelor's degree in Computer Science, Engineering, or a related field; Master's or Ph.D. is a plus.
Professional Experience: 4+ years in engineering roles with significant MLOps contributions.
Technical Proficiency:
Strong programming skills in Python.
Hands-on experience with container orchestration tools like Docker and Kubernetes.
Proficient in designing and deploying agentic systems with modern model serving frameworks (e.g., LangChain, vLLM, FastAPI, or KServe).
Experience with systems architecture, SQL/NoSQL databases, and Git version control.
Software Engineering Skills:
Strong proficiency with CI/CD pipelines and distributed computing frameworks like Ray or Dask.
Familiarity with model monitoring, logging, and versioning tools (e.g., MLflow, Weights & Biases).
Proven expertise in deploying machine learning models in cloud environments (AWS, Azure, or GCP).
Additional Skills (advantageous but not required):
Knowledge of advanced decision theory.
Experience with API building, simulation environments, and reinforcement learning pipelines.
Proficiency with inference runtimes/servers like ONNX, Triton, or similar technologies.
Soft Skills:
Design thinking approach to problem-solving and iterative development.
Strong problem-solving abilities with a bias towards action.
Excellent communication and collaboration skills.
Ability to work autonomously in a fast-paced, dynamic environment.
Job Profile
- Design and implement scalable systems for model deployment
- Establish and refine AI/ML deployment pipelines
- Lead MLOps initiatives
AWS Azure CI/CD Collaboration Communication Container Orchestration Deployment Design Thinking Distributed computing Docker FastAPI GCP Git KServe Kubernetes LangChain MLFlow MLOps Model evaluation Model Testing NoSQL ONNX Operational planning Python Scalability Software Software Engineering SQL Systems architecture Triton VLLM Weights & Biases
Experience4 years
EducationBachelor's Bachelor's degree Computer Science Master's Master's degree Ph.D. Related Field
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9