AI models reach accurate final diagnoses but struggle with early clinical reasoning, study finds

AI models correctly identify diagnoses in over 90% of cases with complete data, but fail at early clinical reasoning, per a Mass General Brigham study. Researchers say the tools need close human oversight.

Categorized in: AI News Healthcare

Published on: Apr 30, 2026

AI Models Excel at Final Diagnosis but Struggle With Early Clinical Reasoning

Large language models can identify the correct diagnosis in more than 90% of cases when given complete patient information, but they fail at the foundational reasoning that doctors use to navigate uncertainty, according to research published in JAMA Network Open.

A study from Mass General Brigham evaluated 21 AI models across structured patient scenarios and found a critical gap: models perform well at confirming diagnoses but struggle to generate them.

Where Models Fall Short

The weakness emerges early in the diagnostic process. When physicians have incomplete information, they generate a differential diagnosis-a list of possible conditions that guides further testing and decision-making. Models failed to do this reliably in most cases.

"These models are great at naming a final diagnosis once the data is complete, but they struggle at the open-ended start of a case, when there isn't much information," said Arya Rao, lead author and M.D.-Ph.D. student at Harvard Medical School.

The problem runs deeper than accuracy metrics suggest. AI systems tend to converge too quickly on a single answer rather than maintaining uncertainty and exploring alternatives-the opposite of how physicians approach ambiguous cases.

A New Way to Measure Performance

Researchers developed an evaluation framework that assesses performance across multiple stages of care: initial hypothesis generation, test selection, final diagnosis, and treatment planning. Traditional accuracy metrics mask weaknesses in intermediate reasoning steps.

Performance improved notably as additional structured data-lab results, imaging-was introduced. This shows that models rely heavily on complete inputs to reach accurate conclusions.

Newer model versions generally outperformed earlier ones, but the underlying limitations in clinical reasoning remained consistent.

What This Means for Healthcare Organizations

The findings cut both ways. High rates of correct final diagnoses reinforce AI's potential as a clinical support tool, particularly in data-rich environments where comprehensive patient information is available.

But the inability to reliably navigate early-stage diagnostic reasoning raises concerns about overreliance. Real-world medicine often involves incomplete information and ambiguity-the exact conditions where these models struggle most.

Current AI systems are not ready to operate independently in clinical environments. They should augment human judgment, not replace it.

"We want to help separate the hype from the reality of these tools as they apply to health care," Rao said. "Our results reinforce that large language models in healthcare continue to require a human-in-the-loop and very close oversight."

Learn more about AI for Healthcare and Generative AI and LLM applications in clinical settings.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

AI models reach accurate final diagnoses but struggle with early clinical reasoning, study finds

AI Models Excel at Final Diagnosis but Struggle With Early Clinical Reasoning

Where Models Fall Short

A New Way to Measure Performance

What This Means for Healthcare Organizations

Related AI News for people in Healthcare

AI models reach accurate final diagnoses but struggle with early clinical reasoning, study finds

AI system outperforms biopsy in detecting bile duct cancer during live procedures, trial finds

Heidi launches in South Africa after 15,000 clinicians adopt AI documentation platform

Health systems lag on AI readiness despite growing pressure to scale, report finds

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: