Senior AI/ML Data Scientist, Natural Language Processing
USA - Massachusetts - Cambridge (320 Bent Street)
Job Description
Position Description:
Senior Artificial Intelligence and Machine Learning (AI/ML) Data Scientist, Natural Language Processing
The Senior AI/ML Data Scientist – Natural Language Processing (NLP) role involves helping to develop and deploy production-grade NLP products for unstructured and semi-structured data from across our company’s research and development pipeline. These models and workflows will help solve real-world problems and contribute to Artificial Intelligence and Machine Learning (AI/ML) in therapeutic research and development.
Key focus areas will include the scalable deployment of ML and Generative AI approaches (such as Large Language Models, or LLMs) for surfacing insights from proprietary unstructured research data and biomedical literature, as well as developing fit-for-purpose approaches for the likes of text classification, relation extraction, and entity linking.
Additional job details:
The types of datasets we focus on are both internal (e.g., electronic lab notebooks, safety reports, regulatory documents, clinical results) and external (e.g., public literature and Electronic Medical Records). In addition to new tool development, we often consult with some of our 5,000+ stakeholders (scientists, engineers, regulatory liaisons, data scientists, etc.) on their own projects, as well as additional stakeholders from across our multi-national company. We strive to enhance data science, NLP, and AI literacy across these groups. As part of our work, we have opportunities to coauthor presentations, reports, manuscripts, and/or public code releases.
The position is embedded in a cross-disciplinary team of data scientists, bioinformaticians, and engineers that are all focused on using cutting-edge software, AI/ML, and data science techniques to drive drug discovery and development.
You enjoy:
- Building novel tools that enable the discovery, development, and delivery of new therapeutics to patients in need
- Understanding real-world challenges and developing automated data solutions for them
- Opportunities to directly interact with users of your data science, ML, and AI products
- Evaluating, developing, testing, and deploying new techniques for natural language understanding
- Freedom to propose projects that interest you and to collaborate cross-functionally on delivery
- Staying updated on the newest methods in NLP, ML, and generative AI
- Sharing the approaches you implement and their impact with internal company audiences and externally
Position Qualifications:
Education Minimum Requirement:
- A PhD in data science, AI/ML/LLM engineering, computer science, semantic engineering or a related discipline OR
- M.S. with 2 years industry experience OR
- B.S. with 5 years industry experience focused on NLP, data science, AI/ML/LLM engineering, computer science, semantic engineering or a related discipline
Required Experience and Skills:
- 2 years experience with Python, Spark, or related frameworks in AI, machine learning, data science, data engineering, or similar context
- 1 years experience with Natural Language Processing, Generative AI, or related techniques for machine understanding of natural language (i.e., written text, omics data, or similar).
Preferred Experience and Skills:
- Fluency in Python programming, version control and collaboration with git, environment management (e.g., poetry, conda, docker), standard Python packages (e.g., pandas, numpy, matplotlib), and at least one ML framework (e.g., pytorch, tensorflow, fairseq)
- Experience with scalable data engineering frameworks such as Apache Spark and orchestration frameworks such as Airflow, and/or experience with semantic search and retrieval frameworks (e.g., development and benchmarking of embedding models and retrieval approaches in the context of Retrieval Augmented Generation, RAG).
- Experience with ML model deployment and operations (e.g., DevOps, MLOps, LLMOps), including CI/CD workflows and tooling (e.g., Github actions)
- Experience with standard operations on non-relational (e.g., Elasticsearch/Opensearch, MongoDB, Neptune), relational databases (e.g., PostgreSQL), and vector databases (e.g., pgvector, Elasticsearch dense vectors) and deployment of APIs and web applications (e.g., flask, fastAPI, django, or dash)
- Working knowledge of statistical learning, such as supervised, unsupervised, and weakly supervised learning, particularly in NLP contexts.
- Working knowledge of NLP and/or Generative AI libraries (e.g., regular expressions, spacy, langchain), text annotation tools, and/or semantic frameworks (e.g. RDF triplestores, property graphs, ontology management).
- A demonstrated ability to engage cross-functional teams and stakeholders, including an eagerness to acquire a level of domain knowledge
- Excellent communication, teamwork, didactic, and leadership skills, including skills for scientific communication (authoring scientific articles and presenting) and guidance and mentorship of junior employees and less experienced collaborators
Current Employees apply HERE
Current Contingent Workers apply HERE
US and Puerto Rico Residents Only:
Our company is committed to inclusion, ensuring that candidates can engage in a hiring process that exhibits their true capabilities. Please click here if you need an accommodation during the application or hiring process.
We are an Equal Opportunity Employer, committed to fostering an inclusive and diverse workplace. All qualified applicants will receive consideration for employment without regard to race, color, age, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, or disability status, or other applicable legally protected characteristics. For more information about personal rights under the U.S. Equal Opportunity Employment laws, visit:
Pay Transparency Nondiscrimination
We are proud to be a company that embraces the value of bringing diverse, talented, and committed people together. The fastest way to breakthrough innovation is when diverse ideas come together in an inclusive environment. We encourage our colleagues to respectfully challenge one another’s thinking and approach problems collectively.
Learn more about your rights, including under California, Colorado and other US State Acts
U.S. Hybrid Work Model
Effective September 5, 2023, employees in office-based positions in the U.S. will be working a Hybrid work consisting of three total days on-site per week, Monday - Thursday, although the specific days may vary by site or organization, with Friday designated as a remote-working day, unless business critical tasks require an on-site presence.This Hybrid work model does not apply to, and daily in-person attendance is required for, field-based positions; facility-based, manufacturing-based, or research-based positions where the work to be performed is located at a Company site; positions covered by a collective-bargaining agreement (unless the agreement provides for hybrid work); or any other position for which the Company has determined the job requirements cannot be reasonably met working remotely. Please note, this Hybrid work model guidance also does not apply to roles that have been designated as “remote”.
The Company is required to provide a reasonable estimate of the salary range for this job in certain states and cities within the United States. Final determinations with respect to salary will take into account a number of factors, which may include, but not be limited to the primary work location and the chosen candidate’s relevant skills, experience, and education.
Expected US salary range:
$122,800.00 - $193,300.00Available benefits include bonus eligibility, long term incentive if applicable, health care and other insurance benefits (for employee and family), retirement benefits, paid holidays, vacation, and sick days. A summary of benefits is listed here.
San Francisco Residents Only: We will consider qualified applicants with arrest and conviction records for employment in compliance with the San Francisco Fair Chance Ordinance
Los Angeles Residents Only: We will consider for employment all qualified applicants, including those with criminal histories, in a manner consistent with the requirements of applicable state and local laws, including the City of Los Angeles’ Fair Chance Initiative for Hiring Ordinance
Search Firm Representatives Please Read Carefully
Merck & Co., Inc., Rahway, NJ, USA, also known as Merck Sharp & Dohme LLC, Rahway, NJ, USA, does not accept unsolicited assistance from search firms for employment opportunities. All CVs / resumes submitted by search firms to any employee at our company without a valid written search agreement in place for this position will be deemed the sole property of our company. No fee will be paid in the event a candidate is hired by our company as a result of an agency referral where no pre-existing agreement is in place. Where agency agreements are in place, introductions are position specific. Please, no phone calls or emails.
Employee Status:
RegularRelocation:
DomesticVISA Sponsorship:
NoTravel Requirements:
10%Flexible Work Arrangements:
HybridShift:
1st - DayValid Driving License:
NoHazardous Material(s):
n/aRequired Skills:
Business Intelligence (BI), Database Design, Data Engineering, Data Modeling, Data Science, Data Visualization, Machine Learning, Software Development, Stakeholder Relationship Management, Waterfall Project ManagementPreferred Skills:
Job Posting End Date:
11/22/2024*A job posting is effective until 11:59:59PM on the day BEFORE the listed job posting end date. Please ensure you apply to a job posting no later than the day BEFORE the job posting end date.
ApplyJob Profile
Hybrid Hybrid work Hybrid work model On-site On-site work required Puerto Rico residents only US and Puerto Rico residents only
Benefits/PerksBonus eligibility Collaboration Diverse workplace Equal opportunity employer Flexible work Flexible work arrangements Health Care Hybrid work Hybrid work model Inclusion Inclusive environment Insurance Mentorship Paid holidays Retirement benefits Sick Days Teamwork Vacation
Tasks- Business intelligence
- Coauthor presentations
- Collaborate cross-functionally
- Collaboration
- Communication
- Compliance
- Consult with stakeholders
- Database design
- Data engineering
- Data modeling
- Data visualization
- Deploy ML models
- Develop NLP products
- Enhance AI literacy
- Innovation
- Leadership
- Machine Learning
- Mentorship
- Project management
- Relationship Management
- Software development
AI AI/ML Airflow Apache Spark APIs Artificial Intelligence Benchmarking Business Intelligence CI/CD Classification Clinical Collaboration Communication Compliance Computer Computer Science Cross-functional Teams Data Database Database design Databases Data engineering Data Modeling Data Science Data Visualization Deployment Design Development DevOps Django Docker Drug Discovery Education ElasticSearch Engineering Entity Linking Fairseq FastAPI Flask Generative AI Git GitHub Inclusion Innovation Large Language Models Leadership Learning LLM LLMOps Machine Learning Management Manufacturing Matplotlib Medical Mentorship ML MLOps Model Deployment Modeling MongoDB Natural Language Processing NLP Numpy Operations Orchestration Organization Pandas PhD PostgreSQL Programming Project Management Python Python programming PyTorch Regulatory Relational databases Relation Extraction Relationship Management Research Research and development Safety Science Scientific communication Semantic Search Software Development Spark Statistical Learning Teams Teamwork TensorFlow Testing Text classification Therapeutics Tool Development Version Control Visualization Waterfall Web applications
Experience5 years
EducationAS B.S. Business Computer Science Data Science Engineering Health Care Management MS Ph.D. Related discipline Science
TimezonesAmerica/Anchorage America/Chicago America/Denver America/Los_Angeles America/New_York Pacific/Honolulu UTC-10 UTC-5 UTC-6 UTC-7 UTC-8 UTC-9