Applied AI Engineer, Clinical Informatics
Job Description
The Lilly research environment is evolving to centralize the access and analysis of human genetic, omic, and clinical data. This new initiative will work to define data, tools and process to provide the therapy area teams key evidence for target evaluation and target discovery.
We are seeking a highly specialized Applied AI Engineer Clinical Informatician to lead research at the intersection of completed clinical trial datasets and biobank-linked population data. This is fundamentally a hands-on research role (not operational trial management), where you will be an individual contributor. Your core mission is to build the systems and tools that extract, define, and contextualize patient phenotypes from locked trial databases, real-world data, and biobank cohorts, that will turn archived data that can generate translational insight that shapes the next generation of clinical research.
You will work with rich, already-collected datasets: locked trial databases, archived omics profiles, longitudinal electronic health records, and population-scale biobank cohorts. Your mandate is to build the AI and ML systems that make these datasets manageable and ready for detailed analysis. This role suits someone who thinks like a scientist, builds like an engineer, and communicates like a clinician.
Key Responsibilities
AI & Machine Learning for Translational Discovery
- Develop and deploy agentic AI applications that enable natural language interaction with clinical data.
- Ground AI outputs in validated biological knowledge, for example implementing RAG pipelines anchored in biomedical ontologies (HPO, Gene Ontology, MeSH, DrugBank), clinical trial registries, and curated pathway databases.
- Deploy unsupervised and self-supervised learning approaches like clustering, representation learning, contrastive learning to discover latent patient archetypes and molecular disease subtypes across trial and biobank data.
- Deploy survival models and dynamic treatment regime estimators using combined clinical and omics features.
- AI tooling to harmonize heterogeneous trial and biobank datasets to common data representations.
- Evaluate and monitor model performance, safety, and reliability in production environments.
- Manage vendors and contractors as well as partner relationships with relevant teams across Lilly.
Post-Trial Data Research & Analysis
- Building pipelines for locked clinical trial databases (SDTM, ADaM) to conduct secondary and exploratory research beyond primary endpoints.
- Deploy ML workflows to identify trial subgroup effects, treatment heterogeneity, and responder/non-responder signatures from completed trial data.
- Mine adverse event narratives, clinical notes, and investigator comments using NLP to surface latent safety signals not captured in structured endpoints in biobanks and clinical datasets.
- Reconstruct patient-level longitudinal trajectories from trial visit data to model disease progression, drug response kinetics, and time-to-event outcomes.
- Architect workflows for meta-analytic and cross-trial integrative analyses across multiple completed studies to identify generalizable biological and clinical patterns.
- Build connections to large-scale biobank cohorts (UK Biobank, All of Us, etc.) as external validation and enrichment resources for trial-derived findings for clinical phenotypes.
Research Rigor, Reproducibility & Governance:
- Establish research data management practices ensuring full reproducibility of analyses including data versioning, containerized compute environments, and audit-ready analysis logs.
- Ensure all research activities follow HIPAA, GDPR, and relevant IRB and ethics committee requirements.
The Lilly research environment is evolving to centralize the access and analysis of human genetic, omic, and clinical data. This new initiative will work to define data, tools and process to provide the therapy area teams key evidence for target evaluation and target discovery.
We are seeking a highly specialized Applied AI Engineer Clinical Informatician to lead research at the intersection of completed clinical trial datasets and biobank-linked population data. This is fundamentally a hands-on research role (not operational trial management), where you will be an individual contributor. Your core mission is to build the systems and tools that extract, define, and contextualize patient phenotypes from locked trial databases, real-world data, and biobank cohorts, that will turn archived data that can generate translational insight that shapes the next generation of clinical research.
You will work with rich, already-collected datasets: locked trial databases, archived omics profiles, longitudinal electronic health records, and population-scale biobank cohorts. Your mandate is to build the AI and ML systems that make these datasets manageable and ready for detailed analysis. This role suits someone who thinks like a scientist, builds like an engineer, and communicates like a clinician.
Key Responsibilities
AI & Machine Learning for Translational Discovery
- Develop and deploy agentic AI applications that enable natural language interaction with clinical data.
- Ground AI outputs in validated biological knowledge, for example implementing RAG pipelines anchored in biomedical ontologies (HPO, Gene Ontology, MeSH, DrugBank), clinical trial registries, and curated pathway databases.
- Deploy unsupervised and self-supervised learning approaches like clustering, representation learning, contrastive learning to discover latent patient archetypes and molecular disease subtypes across trial and biobank data.
- Deploy survival models and dynamic treatment regime estimators using combined clinical and omics features.
- AI tooling to harmonize heterogeneous trial and biobank datasets to common data representations.
- Evaluate and monitor model performance, safety, and reliability in production environments.
- Manage vendors and contractors as well as partner relationships with relevant teams across Lilly.
Post-Trial Data Research & Analysis
- Building pipelines for locked clinical trial databases (SDTM, ADaM) to conduct secondary and exploratory research beyond primary endpoints.
- Deploy ML workflows to identify trial subgroup effects, treatment heterogeneity, and responder/non-responder signatures from completed trial data.
- Mine adverse event narratives, clinical notes, and investigator comments using NLP to surface latent safety signals not captured in structured endpoints in biobanks and clinical datasets.
- Reconstruct patient-level longitudinal trajectories from trial visit data to model disease progression, drug response kinetics, and time-to-event outcomes.
- Architect workflows for meta-analytic and cross-trial integrative analyses across multiple completed studies to identify generalizable biological and clinical patterns.
- Build connections to large-scale biobank cohorts (UK Biobank, All of Us, etc.) as external validation and enrichment resources for trial-derived findings for clinical phenotypes.
Research Rigor, Reproducibility & Governance:
- Establish research data management practices ensuring full reproducibility of analyses including data versioning, containerized compute environments, and audit-ready analysis logs.
- Ensure all research activities follow HIPAA, GDPR, and relevant IRB and ethics committee requirements.
Basic Qualifications
- M.S. in Biomedical Informatics, Computational Biology, Bioinformatics, Statistical Genetics, Epidemiology, or a closely related quantitative field, or an MD/PhD with equivalent depth in translational data science, with 6+ years of research experience working with clinical trial datasets (SDTM/ADaM), biobank data, or large-scale population health data in an academic, pharmaceutical, or research institute setting.
- Or Ph.D. in Biomedical Informatics, Computational Biology, Bioinformatics, Statistical Genetics, Epidemiology, or a closely related quantitative field, or an MD/PhD with equivalent depth in translational data science, with 3+ years of research experience working with clinical trial datasets (SDTM/ADaM), biobank data, or large-scale population health data in an academic, pharmaceutical, or research institute setting.
Additional Skills & Preferences
- Demonstrated use of AI tools in production environments for clinical data analysis.
- Expert proficiency in Python and/or R for statistical modelling and ML; strong command of SQL and experience with cloud-based research computing environments (ideally DNAnexus, AWS, GCP, Azure, or HPC clusters).
- Familiar with advanced generative AI methods like finetuning of LLMs. Building and training foundation models from scratch. High performance computing environments.
- Deep knowledge of CDISC standards (SDTM, ADaM) and experience analyzing clinical trial databases for secondary research purposes.
- Demonstrated experience applying ML methods including survival analysis, causal inference, NLP, and deep learning to clinical or genomic research questions.
- Thorough understanding of OMOP CDM, HL7 FHIR Genomics, and major biomedical ontologies.
- Direct research experience with major public and restricted-access biobank resources (UK Biobank, All of Us, etc.).
- Experience with federated learning, differential privacy, or secure computation frameworks applied to multi-site biomedical research.
- Track record of peer-reviewed publications in clinical AI, translational informatics, genomics, or a related field.
- Familiarity with the target trial framework and its application in biobanks.
- Knowledge of pharmacogenomics, drug response modeling, or PK/PD data analysis from clinical trials.
- Experience with knowledge graph construction, graph ML, or ontology-driven reasoning for biomedical discovery.
- Hands-on experience with multi-omic data analysis.
Compensation: $181,500 – $283,800 (actual compensation depends on candidate education, experience, skills, and geographic location). Full-time equivalent employees are also eligible for a company bonus and a comprehensive benefits program (401(k), pension, medical/dental/vision, life insurance, time off, well-being benefits, etc.).
Requisition: R-105948 · Available in 2 US locations.
Frequently Asked Questions
Where is this job located?
What are the key responsibilities for this role?
What qualifications are required for this position?
What is the salary range for this role?
What benefits are offered with this position?
Is visa sponsorship available?
Job Information
Get Similar Jobs by Email
Weekly digest of Eli Lilly and Company and similar companies. Free.