Applied AI Engineer, Clinical Informatics

Eli Lilly and Company
Eli Lilly and Company logo
Location
Boston, MA
Job Type
Full-time
Reposted
May 24, 2026
Originally posted May 22, 2026
Views
19
Salary Range
$182k - $284k USD

Job Description

The Lilly research environment is evolving to centralize the access and analysis of human genetic, omic, and clinical data. This new initiative will work to define data, tools and process to provide the therapy area teams key evidence for target evaluation and target discovery.

We are seeking a highly specialized Applied AI Engineer Clinical Informatician to lead research at the intersection of completed clinical trial datasets and biobank-linked population data. This is fundamentally a hands-on research role (not operational trial management), where you will be an individual contributor. Your core mission is to build the systems and tools that extract, define, and contextualize patient phenotypes from locked trial databases, real-world data, and biobank cohorts, that will turn archived data that can generate translational insight that shapes the next generation of clinical research.

You will work with rich, already-collected datasets: locked trial databases, archived omics profiles, longitudinal electronic health records, and population-scale biobank cohorts. Your mandate is to build the AI and ML systems that make these datasets manageable and ready for detailed analysis. This role suits someone who thinks like a scientist, builds like an engineer, and communicates like a clinician.

Key Responsibilities

AI & Machine Learning for Translational Discovery

  • Develop and deploy agentic AI applications that enable natural language interaction with clinical data.
  • Ground AI outputs in validated biological knowledge, for example implementing RAG pipelines anchored in biomedical ontologies (HPO, Gene Ontology, MeSH, DrugBank), clinical trial registries, and curated pathway databases.
  • Deploy unsupervised and self-supervised learning approaches like clustering, representation learning, contrastive learning to discover latent patient archetypes and molecular disease subtypes across trial and biobank data.
  • Deploy survival models and dynamic treatment regime estimators using combined clinical and omics features.
  • AI tooling to harmonize heterogeneous trial and biobank datasets to common data representations.
  • Evaluate and monitor model performance, safety, and reliability in production environments.
  • Manage vendors and contractors as well as partner relationships with relevant teams across Lilly.

Post-Trial Data Research & Analysis

  • Building pipelines for locked clinical trial databases (SDTM, ADaM) to conduct secondary and exploratory research beyond primary endpoints.
  • Deploy ML workflows to identify trial subgroup effects, treatment heterogeneity, and responder/non-responder signatures from completed trial data.
  • Mine adverse event narratives, clinical notes, and investigator comments using NLP to surface latent safety signals not captured in structured endpoints in biobanks and clinical datasets.
  • Reconstruct patient-level longitudinal trajectories from trial visit data to model disease progression, drug response kinetics, and time-to-event outcomes.
  • Architect workflows for meta-analytic and cross-trial integrative analyses across multiple completed studies to identify generalizable biological and clinical patterns.
  • Build connections to large-scale biobank cohorts (UK Biobank, All of Us, etc.) as external validation and enrichment resources for trial-derived findings for clinical phenotypes.

Research Rigor, Reproducibility & Governance:

  • Establish research data management practices ensuring full reproducibility of analyses including data versioning, containerized compute environments, and audit-ready analysis logs.
  • Ensure all research activities follow HIPAA, GDPR, and relevant IRB and ethics committee requirements.

The Lilly research environment is evolving to centralize the access and analysis of human genetic, omic, and clinical data. This new initiative will work to define data, tools and process to provide the therapy area teams key evidence for target evaluation and target discovery.

We are seeking a highly specialized Applied AI Engineer Clinical Informatician to lead research at the intersection of completed clinical trial datasets and biobank-linked population data. This is fundamentally a hands-on research role (not operational trial management), where you will be an individual contributor. Your core mission is to build the systems and tools that extract, define, and contextualize patient phenotypes from locked trial databases, real-world data, and biobank cohorts, that will turn archived data that can generate translational insight that shapes the next generation of clinical research.

You will work with rich, already-collected datasets: locked trial databases, archived omics profiles, longitudinal electronic health records, and population-scale biobank cohorts. Your mandate is to build the AI and ML systems that make these datasets manageable and ready for detailed analysis. This role suits someone who thinks like a scientist, builds like an engineer, and communicates like a clinician.

Key Responsibilities

AI & Machine Learning for Translational Discovery

  • Develop and deploy agentic AI applications that enable natural language interaction with clinical data.
  • Ground AI outputs in validated biological knowledge, for example implementing RAG pipelines anchored in biomedical ontologies (HPO, Gene Ontology, MeSH, DrugBank), clinical trial registries, and curated pathway databases.
  • Deploy unsupervised and self-supervised learning approaches like clustering, representation learning, contrastive learning to discover latent patient archetypes and molecular disease subtypes across trial and biobank data.
  • Deploy survival models and dynamic treatment regime estimators using combined clinical and omics features.
  • AI tooling to harmonize heterogeneous trial and biobank datasets to common data representations.
  • Evaluate and monitor model performance, safety, and reliability in production environments.
  • Manage vendors and contractors as well as partner relationships with relevant teams across Lilly.

Post-Trial Data Research & Analysis

  • Building pipelines for locked clinical trial databases (SDTM, ADaM) to conduct secondary and exploratory research beyond primary endpoints.
  • Deploy ML workflows to identify trial subgroup effects, treatment heterogeneity, and responder/non-responder signatures from completed trial data.
  • Mine adverse event narratives, clinical notes, and investigator comments using NLP to surface latent safety signals not captured in structured endpoints in biobanks and clinical datasets.
  • Reconstruct patient-level longitudinal trajectories from trial visit data to model disease progression, drug response kinetics, and time-to-event outcomes.
  • Architect workflows for meta-analytic and cross-trial integrative analyses across multiple completed studies to identify generalizable biological and clinical patterns.
  • Build connections to large-scale biobank cohorts (UK Biobank, All of Us, etc.) as external validation and enrichment resources for trial-derived findings for clinical phenotypes.

Research Rigor, Reproducibility & Governance:

  • Establish research data management practices ensuring full reproducibility of analyses including data versioning, containerized compute environments, and audit-ready analysis logs.
  • Ensure all research activities follow HIPAA, GDPR, and relevant IRB and ethics committee requirements.

Basic Qualifications

  • M.S. in Biomedical Informatics, Computational Biology, Bioinformatics, Statistical Genetics, Epidemiology, or a closely related quantitative field, or an MD/PhD with equivalent depth in translational data science, with 6+ years of research experience working with clinical trial datasets (SDTM/ADaM), biobank data, or large-scale population health data in an academic, pharmaceutical, or research institute setting.
  • Or Ph.D. in Biomedical Informatics, Computational Biology, Bioinformatics, Statistical Genetics, Epidemiology, or a closely related quantitative field, or an MD/PhD with equivalent depth in translational data science, with 3+ years of research experience working with clinical trial datasets (SDTM/ADaM), biobank data, or large-scale population health data in an academic, pharmaceutical, or research institute setting.

Additional Skills & Preferences

  • Demonstrated use of AI tools in production environments for clinical data analysis.
  • Expert proficiency in Python and/or R for statistical modelling and ML; strong command of SQL and experience with cloud-based research computing environments (ideally DNAnexus, AWS, GCP, Azure, or HPC clusters).
  • Familiar with advanced generative AI methods like finetuning of LLMs. Building and training foundation models from scratch. High performance computing environments.
  • Deep knowledge of CDISC standards (SDTM, ADaM) and experience analyzing clinical trial databases for secondary research purposes.
  • Demonstrated experience applying ML methods including survival analysis, causal inference, NLP, and deep learning to clinical or genomic research questions.
  • Thorough understanding of OMOP CDM, HL7 FHIR Genomics, and major biomedical ontologies.
  • Direct research experience with major public and restricted-access biobank resources (UK Biobank, All of Us, etc.).
  • Experience with federated learning, differential privacy, or secure computation frameworks applied to multi-site biomedical research.
  • Track record of peer-reviewed publications in clinical AI, translational informatics, genomics, or a related field.
  • Familiarity with the target trial framework and its application in biobanks.
  • Knowledge of pharmacogenomics, drug response modeling, or PK/PD data analysis from clinical trials.
  • Experience with knowledge graph construction, graph ML, or ontology-driven reasoning for biomedical discovery.
  • Hands-on experience with multi-omic data analysis.

Compensation: $181,500 – $283,800 (actual compensation depends on candidate education, experience, skills, and geographic location). Full-time equivalent employees are also eligible for a company bonus and a comprehensive benefits program (401(k), pension, medical/dental/vision, life insurance, time off, well-being benefits, etc.).

Requisition: R-105948 · Available in 2 US locations.

Frequently Asked Questions

Where is this job located?
This position is located in Boston, MA.
What are the key responsibilities for this role?
You will build AI and ML systems to make datasets manageable and ready for detailed analysis, develop agentic AI applications for clinical data interaction, and establish research data management practices ensuring reproducibility.
What qualifications are required for this position?
You need an M.S. with 6+ years or a Ph.D. with 3+ years of research experience in a relevant quantitative field, working with clinical trial datasets, biobank data, or large-scale population health data.
What is the salary range for this role?
The salary range is $181,500 – $283,800, depending on education, experience, skills, and location. A company bonus and comprehensive benefits program are also included.
What benefits are offered with this position?
Benefits include a 401(k), pension, medical/dental/vision, life insurance, time off, and well-being benefits.
Is visa sponsorship available?
Visa sponsorship information is not provided in the job description.

Ready to Apply?

Apply for this Position

You'll be redirected to the company's application page

Share this job:

Job Information

Source: manual
AI Relevance: 85/100 (Highly relevant)
Remote Type: onsite
Experience: Senior
Allowed Locations: Worldwide
Skills & Tags:
AI machine learning clinical informatics clinical trials SDTM ADaM biobank UK Biobank All of Us OMOP

Get Similar Jobs by Email

Weekly digest of Eli Lilly and Company and similar companies. Free.

Related Jobs

Get weekly job alerts