AI in Health Care: How Large Language Models Stack Up in Diagnosing, Coding, and Risk Prediction

AI models show high accuracy in generating diagnoses but struggle with medical coding and readmission risk prediction. Combining AI with human oversight is essential for safe healthcare use.

Categorized in: AI News Healthcare

Published on: Aug 07, 2025

AI and Health Care: Assessing Patient Risks and Medical Coding with Large-Language Models

Journal of Medical Internet Research evaluated how various LLMs perform essential clinical tasks, including diagnosis generation, medical coding, and hospital readmission risk assessment.

Evaluating Clinical Tasks with AI

The study focused on five LLMs: DeepSeek-R1 and OpenAI-O3 (reasoning models), alongside ChatGPT-4, Gemini-1.5, and LLaMA-3.1 (non-reasoning models). Researchers tested these models using 300 hospital discharge summaries, providing them with structured clinical data such as chief complaints, medical and surgical history, lab results, and imaging reports.

The goal was to see if these AI systems could accurately:

Generate primary diagnoses
Predict ICD-9 medical codes
Stratify hospital readmission risk

How Did the Models Perform?

Primary Diagnosis Generation

This was the strongest area for LLMs. Among non-reasoning models, LLaMA-3.1 led with an 85% accuracy rate, followed closely by ChatGPT-4 at 84.7%. OpenAI-O3, a reasoning model, outperformed all with a 90% accuracy rate.

ICD-9 Medical Code Prediction

All models struggled here. Accuracy dropped sharply, with LLaMA-3.1 at 42.6%, ChatGPT-4 at 40.6%, and Gemini-1.5 lagging at just 14.6%. The reasoning model OpenAI-O3 scored 45.3%, slightly better but still limited.

Hospital Readmission Risk Stratification

Risk prediction was challenging. Non-reasoning models hovered around 33-41% accuracy. Reasoning models performed better, with DeepSeek-R1 achieving 72.66% and OpenAI-O3 close behind at 70.66%.

Key Insights for Healthcare Professionals

While LLMs show promising capabilities, especially in diagnosis generation, their current limitations in medical coding and risk prediction mean they cannot replace human expertise. Errors in coding can lead to billing mistakes and flawed analytics, while inaccurate readmission risk assessments may impact patient safety and discharge planning.

Liability and transparency remain concerns when AI systems make mistakes or generate misleading information, raising questions about accountability among developers, clinicians, and healthcare providers.

The Path Forward

Reasoning models generally outperformed non-reasoning ones, offering better interpretability and marginally improved accuracy. Still, reliability issues persist. The study suggests that improving LLM performance requires:

Task-specific fine-tuning on clinical datasets
Hybrid human-AI workflows to combine strengths
Bias detection and correction mechanisms
Continuous monitoring and governance frameworks

Future research will explore repeated trials, larger datasets, and further tuning to boost stability and safety in clinical applications.

Balancing Optimism with Caution

As AI continues to influence healthcare delivery, it’s crucial to integrate these tools responsibly. Combining AI with human oversight can improve clinical workflows, but safeguards must protect patient safety and maintain trust.

Healthcare professionals interested in expanding their knowledge of AI applications in clinical settings can explore specialized training and courses. For practical AI skills and certification options, visit Complete AI Training.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

AI in Health Care: How Large Language Models Stack Up in Diagnosing, Coding, and Risk Prediction

Evaluating Clinical Tasks with AI

How Did the Models Perform?

Primary Diagnosis Generation

ICD-9 Medical Code Prediction

Hospital Readmission Risk Stratification

Key Insights for Healthcare Professionals

The Path Forward

Balancing Optimism with Caution

Related AI News for people in Healthcare

IISc's TANUH puts AI to work in primary care and early disease detection

Hospitals are scaling AI beyond pilots - Ventripoint, Doximity, Oscar, Cognizant, and Certara turn automation into measurable results

Mass General Brigham Spins Off New AI Company

Johns Hopkins and Great Learning Launch 10-Week AI in Healthcare Program, No Coding Required

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: