Johns Hopkins researchers launch two Cancer AI Alliance projects
11/10/2025
Researchers at the Johns Hopkins Kimmel Cancer Center and Whiting School of Engineering announced two projects under the Cancer AI Alliance (CAIA) that show how privacy-first AI can improve cancer research and care. As the only research university and comprehensive health system in CAIA, Johns Hopkins is coordinating expertise across medicine, engineering, and public health via the Kimmel Cancer Center, the Malone Center for Engineering in Healthcare, the Data Science and AI Institute, and the inHealth Precision Medicine program.
Why this matters
CAIA, founded in 2024, is built on federated learning so participating centers keep data local while sharing models and insights. "AI tools travel to the data, not the other way around," says Vasan Yegnasubramanian, M.D., Ph.D., professor of oncology, pathology, and radiation oncology and molecular radiation sciences, and director of inHealth Precision Medicine. The approach protects privacy and makes multi-institution collaboration feasible at scale.
- Data never leaves institutional boundaries; models are trained across sites.
- Results generalize across populations while maintaining governance and compliance.
- Teams can iterate faster with shared infrastructure and standardized evaluation.
Project 1: Large language model learns patient trajectories from EHRs
Led by Mathias Unberath, Ph.D., Jeff Weaver, Ph.D., Vasan Yegnasubramanian, M.D., Ph.D., and Alexis Battle, Ph.D., the team is fine-tuning a large language model on structured electronic health record data. The goal: model a patient's timeline to anticipate future diagnoses, treatments, or test results for cancer care.
The effort combined faculty, students, research IT, and software engineers from the Data Science and AI Institute to move from concept to execution on an aggressive timeline. Early work focuses on temporal pattern learning, label scarcity, and evaluation that accounts for data drift and site variability.
Project 2: Multicenter study of IDH-mutant glioma and astrocytoma
Led by Karisa Schreck, M.D., Ph.D., Jessie Tong, Ph.D., Taxiarchis Botsis, M.Sc., Ph.D., Weaver, and Yegnasubramanian, this study aggregates insights across institutions for rare IDH-mutant glioma and astrocytoma. Using CAIA's multi-cloud federated platform, the team can study real-world practice patterns and outcomes from new precision therapies without pooling raw data.
"The insights will help doctors individualize care, identifying patients most likely to benefit from targeted drugs and guiding if and when other treatments are needed," says Yegnasubramanian. This builds on Johns Hopkins' 2008 discovery of the IDH gene and the 2024 FDA approval of vorasidenib for IDH-mutant low-grade glioma.
See FDA announcements on recent oncology approvals
Scale, collaborators, and what's next
These two studies are among eight projects launched under CAIA, which also includes Dana-Farber Cancer Institute, Fred Hutch Cancer Center, and Memorial Sloan Kettering Cancer Center. Over the next year, CAIA plans to add more models and centers, from treatment response prediction to rare cancer studies.
"CAIA allows us to innovate across the full spectrum of cancer," says Yegnasubramanian. "On one end, we can develop foundational models trained on data from more than a million patients. On the other, we can study rare cancers and develop AI models that improve therapy for patients who previously had little guidance."
Technical notes for researchers
- Federated training: model updates are aggregated across sites; raw data stays local.
- Multi-cloud setup: supports heterogeneous institutional infrastructure while maintaining secure orchestration.
- Evaluation: prospective and site-stratified evaluation plans are critical for transportability.
- Governance: clear audit trails and model cards help with reproducibility and regulatory readiness.
- EHR modeling: temporal representation learning and missingness-aware features improve trajectory prediction.
Support and partners
CAIA receives financial and technical support from Amazon Web Services (AWS), Deloitte, Ai2 (Allen Institute for AI), Google, Microsoft, NVIDIA, and Slalom. The collaboration pairs domain expertise with modern AI tooling to address clinical questions that previously lacked large, diverse datasets.
Further resources
- Federated learning (technical overview)
- Applied AI courses by job role (for research teams building skills)
Your membership also unlocks: