Mayo Clinic AI detects pancreatic cancer 16 months early as studies show LLMs outperform physicians on complex diagnoses

Mayo Clinic's AI detected pancreatic cancer 475 days before standard diagnosis, with 73% sensitivity versus 39% for radiologists. Separately, an OpenAI model hit 78.3% accuracy on complex emergency cases, outperforming physicians.

Categorized in: AI News Healthcare

Published on: May 08, 2026

Two Studies Show AI's Potential in High-Stakes Diagnoses

Mayo Clinic researchers have demonstrated that an AI model can detect pancreatic cancer an average of 16 months before radiologists do. Meanwhile, Harvard and Stanford researchers found that advanced language models outperform physicians in diagnosing complex clinical cases in emergency settings.

These findings suggest healthcare systems need to prepare infrastructure and workflows to integrate AI tools into patient care.

AI Detects Pancreatic Cancer Earlier Than Radiologists

Mayo Clinic's REDMOD model identified pancreatic ductal adenocarcinoma in routine CT scans at a median lead time of 475 days-about 36 months earlier than typical diagnoses. The model analyzed nearly 2,000 scans, including scans from patients later diagnosed with pancreatic cancer that were originally read as normal.

The AI achieved 73% sensitivity compared to 39% for human radiologists reviewing the same scans. For cases more than 24 months before clinical diagnosis, REDMOD detected nearly three times as many early cancers that would otherwise go undetected.

The model uses radiomics-a method that applies machine learning to extract detailed data from standard medical images. Mayo Clinic researchers said the results position the tool for testing in high-risk patient populations.

Advanced Language Models Outperform Physicians in Emergency Cases

Researchers at Harvard Medical School, Beth Israel Deaconess Medical Center, and Stanford University tested OpenAI's o1-series model on complex diagnostic cases. The model achieved 78.3% accuracy on New England Journal of Medicine clinicopathologic conference cases-a longstanding benchmark for diagnostic reasoning.

In 52% of challenging cases, the model's first suggestion was the correct diagnosis. When including secondary suggestions marked as "potentially helpful" or "very close," accuracy reached 97.9%.

When compared directly to GPT-4, the o1-preview model outperformed it in 24.3% of cases while GPT-4 exceeded it in only 7.1% of cases. The gap widened in emergency department settings where clinicians have the least information available at initial triage.

Researchers tested the model against real, unstructured clinical data from a major academic emergency department. The AI consistently improved as more patient information became available, matching the pattern seen in human physicians-but with wider performance margins.

Healthcare Systems Should Prepare for AI Integration

The researchers said healthcare systems need to invest in computing infrastructure and design workflows for what they call "clinician-AI interaction." This includes building monitoring frameworks that track not just diagnostic accuracy but also safety, efficiency, and cost.

The Harvard and Stanford researchers noted that while applying AI to clinical decision support carries risks, these tools could reduce the human and financial costs of diagnostic errors and delays. They called for evaluation of these technologies in real-world patient care settings.

Learn more about AI for Healthcare and Generative AI and LLM applications.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)