Principal, Software Engineer - Conversational AI
(USA) SUNNYVALE IV- 680 W CALIFORNIA CA SUNNYVALE Home Office, United States
Position Summary...
What you'll do...
Cortex Team is Walmart’s core A.I. conversational platform, powering the vision of delivering the world’s best personal assistants to Walmart’s customers, accessible via natural voice commands, text messages, rich UI interactions, and a mix of all of the above via multi-modal experiences.
We believe conversations are a natural and powerful user interface for interacting with technology and enable a richer customer experiences – both online and in-store. We are building and designing the next generation of Natural Language Understanding (NLU) services that other teams can easily integrate and leverage, and build rich experiences: from pure voice and text shopping assistants (Siri, Sparky), to customer care channels, to mobile apps with rich, intertwined, multi-modal interaction modes (Me@Walmart).
Interested in diving in?
We need solid engineers with the talent and expertise required to design, build, improve and evolve our capabilities in at least some of the following areas:
- Service oriented architecture in charge of exposing our NLU capabilities at scale, and enabling increasingly sophisticated model orchestration.
- Since the service takes in traffic for a large set of Walmart customers (that is 80% of American households!), you will get to solve non trivial challenges in terms of service scalability and availability.
- You will design and build the primitives to efficiently orchestrate model-serving microservices, taking into account their dependencies, and improving the combined latency and robustness of such microservices (e.g. fan out in parallel to N services for a single request, and reply with whichever gives the fastest answer).
- You will also bake-in functionality which can drive improved machine learning modeling and experimental design, such as A/B testing.
- Model serving and operations
- There is a constant tension between model improvements (more computations) and model serving latency. So, we are always in a quest of crunching more numbers, while preserving our SLAs, and controlling the operational costs.
- You will guide our efforts to always find the best tradeoffs in terms of architecture, tooling (Tensorflow serving? / ONNYX? / Triton?) and infrastructure (CPU? / GPU?, GCP? / Azure?) for model serving – based on the latest model developments and product requirements.
- In particular, you will drive principled and scientific load-testing efforts, to clearly identify the tradeoffs at hands, and tune/optimize the model-serving stack.
- If interested, you will also get some opportunity to work on prompt engineering and agentic systems.
- Tooling, infrastructure and pipelines for reproducible workflow and models, …
This job isn't fresh anymore!
Search Fresh JobsJob Profile
Company discounts Competitive pay Fully remote Great perks Health plans Incentive awards Multiple health plans Paid maternity and parental leave Paid Time Off PTO Stock purchase Stock Purchase Plan
Tasks- Design and build NLU services
- Integrate labeling tools
- Optimize model-serving stack
- Orchestrate model-serving microservices
A/B Testing AI Airflow AWS Azure Big Data BigQuery C Classification Cloud Infrastructure Communication Computer Engineering Continuous Deployment Conversational AI Customer Care Deployment Diagnostics Distributed Systems Docker English GCP Hadoop Hive Innovation Java Kafka Kubernetes Labeling Tools Machine Learning Microservices Model Orchestration Natural Language Understanding Presto Python Relational database Resource Management Scala Scalability Service Oriented Architecture Software Engineering Spark SQL TensorFlow Writing
Experience5 years
EducationBachelor's Bachelor's degree Bachelor's degree in Computer Science Computer Science Engineering Mathematics MS Related Field
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9