AI-Enabled Pipeline Accelerates Medical Data Extraction at UT Southwestern
DALLAS – July 28, 2025 – Researchers at UT Southwestern Medical Center have developed an artificial intelligence (AI) pipeline that rapidly and accurately extracts critical information from free-text medical records. Published in npj Digital Medicine, this approach aims to significantly cut down the time needed to prepare detailed datasets for clinical research.
“Building detailed and accurate datasets from narrative medical notes is typically slow and requires extensive manual review,” explained David Hein, M.S., Data Scientist at UT Southwestern’s Lyda Hill Department of Bioinformatics. “Our AI-powered large language models (LLMs) streamline the extraction and standardization of medical data, making large-scale clinical studies more efficient.”
Developing the AI Pipeline
The team trained an LLM on over 2,200 kidney cancer pathology reports to test its ability to identify and categorize tumor types. The development process involved collaboration between AI specialists, pathologists, clinicians, and statisticians to iteratively refine the workflow. Validation against existing electronic medical records (EMR) confirmed the pipeline’s reliability, with 99% accuracy in tumor type identification and 97% accuracy in detecting metastasis.
“Clinicians often use varied and open-ended language to describe findings, making data extraction challenging,” noted Payal Kapur, M.D., Professor of Pathology and Urology. “Unlike simple yes-no data, narrative reports contain hundreds of nuanced details. With the right training and supervision, AI can quickly and accurately process this information.”
Testing and Collaboration
The pipeline was further tested on a larger dataset of more than 3,500 kidney cancer reports, maintaining similar high accuracy. This success was supported by the comprehensive data and infrastructure available through UT Southwestern’s Kidney Cancer Program.
“Cross-disciplinary teamwork is essential to fine-tune AI instructions and ensure precision,” said James Brugarolas, M.D., Ph.D., Director of the Kidney Cancer Program and Professor of Internal Medicine. His team, along with other experts, contributed to refining the AI models.
Broader Implications
Although this study focused on kidney cancer, the approach has potential applications for other tumor types and medical fields. “There is no universal model for extracting medical data,” explained Andrew Jamieson, Ph.D., Assistant Professor in Bioinformatics. “Our work outlines practical strategies for applying AI-powered LLMs in various specialties. We plan to continue improving this process and expanding AI’s role in medical research.”
Research Team and Funding
- Bingqing Xie, Ph.D., Assistant Professor of Internal Medicine
- Joseph Vento, M.D., Assistant Professor of Internal Medicine
- Lindsay Cowell, Ph.D., Professor, Peter O’Donnell Jr. School of Public Health
- Scott Christley, Ph.D., Computational Biologist
- Ameer Hamza Shakur, Ph.D., Data Scientist/Machine Learning Engineer
- Michael Holcomb, M.S., Lead Data Scientist
- Alana Christie, M.S., Biostatistical Consultant
- Neil Rakheja, Student Intern
- AJ Jain, Ph.D. candidate, Biomedical Engineering
Dr. Kapur holds the Jan and Bob Pickens Distinguished Professorship in Medical Science. Dr. Brugarolas holds the Sherry Wigley Crow Cancer Research Endowed Chair. Several team members are affiliated with the Simmons Cancer Center. Funding came from the National Cancer Institute’s Kidney Cancer Specialized Program of Research Excellence and the Brock Fund for Medical Science Chair in Pathology.
About UT Southwestern Medical Center
UT Southwestern is a leading academic medical center known for integrating biomedical research with clinical care and education. Its faculty includes Nobel laureates, National Academy members, and Howard Hughes Medical Institute Investigators. The institution handles over 140,000 hospitalizations, 360,000 emergency cases, and 5.1 million outpatient visits annually across more than 80 specialties.
For those interested in how AI can streamline complex data workflows in healthcare and research, exploring advanced AI training and tools can be valuable. Resources like Complete AI Training offer courses on AI applications, including data extraction and automation.
Your membership also unlocks: