AI Models Outperform Doctors at Summarizing Cancer Pathology Reports
Large language models generated more complete summaries of complex cancer pathology reports than physicians, according to a study from Northwestern Medicine published in JCO Clinical Cancer Informatics. Researchers tested six open-source models from Meta, Google, DeepSeek, and Mistral AI on 94 lung cancer pathology reports.
The AI-generated summaries were more complete than physician-written ones, particularly for molecular and genetic findings. A panel of oncologists assessed the summaries for accuracy, completeness, and clinical risk.
Why This Matters for Oncology
Cancer pathology reports have grown increasingly complex. As biomarker testing expands and patients live longer, reports now span multiple institutions and contain detailed histopathological, immunohistochemical, and molecular data. Clinicians must synthesize this information under time pressure.
"As cancer care becomes increasingly complex, the burden of synthesizing complex reports is growing rapidly," said Dr. Mohamed Abazeed, chair of radiation oncology at Northwestern University Feinberg School of Medicine. "AI can help ensure critical pathological and genomic details are consistently captured - not as a replacement for physicians, but as a tool to augment clinical decision-making."
Which Models Performed Best
The strongest performers were DeepSeek-R1 and Meta's Llama 3.1. All six models tested - Llama 3.0, 3.1, and 3.2; Google's Gemma 9B; Mistral 7.2B; and DeepSeek-R1 - are open-source systems that researchers can download and run locally, not consumer chatbots.
The models analyzed text content from pathology reports and generated structured summaries capturing microscopic tumor characteristics, protein expression data, and genetic information relevant to treatment decisions.
Next Steps: Clinical Validation
Northwestern is developing an app using Llama 3.1 that would allow physicians to upload pathology reports and receive AI-generated summaries for review. The team emphasizes that deployment requires additional testing and validation.
"Patients with complex cancers might benefit the most," said Dr. Yirong Liu, first author of the study. "In cases where missing a key pathological finding or an actionable genetic marker could change treatment decisions, ensuring that information is consistently captured is critical."
Patients living longer with cancer often undergo repeated biopsies and genetic sequencing. Reports can span dozens of pages. A single missed detail can affect treatment choices - the area where AI for healthcare applications may provide practical support.
Your membership also unlocks: