AI Skin Disease Diagnosis Models Show Significant Sex and Age Biases, Study Finds

An international study found ChatGPT-4 is fairer across sex and age in diagnosing skin diseases, while LLaVA shows sex-related biases. Researchers plan to include skin tone in future analyses.

Categorized in: AI News Science and Research

Published on: Jul 25, 2025

International Study Highlights Sex and Age Biases in AI Skin Disease Diagnosis

An international research team led by Assistant Professor Zhiyu Wan from ShanghaiTech University has uncovered important findings on bias in AI models diagnosing skin diseases. Published in the journal Health Data Science, the study evaluates multimodal large language models (LLMs) such as ChatGPT-4 and LLaVA when analyzing dermatoscopic images.

The researchers assessed approximately 10,000 images focusing on three common skin conditions: melanoma, melanocytic nevi, and benign keratosis-like lesions. The evaluation spanned various sex and age groups to detect potential disparities in model performance.

Key Findings on Model Performance and Fairness

Both ChatGPT-4 and LLaVA outperformed most traditional deep learning models in diagnosing skin diseases overall.
ChatGPT-4 demonstrated a higher degree of fairness across different sex and age demographics.
LLaVA showed significant sex-related biases, indicating uneven diagnostic accuracy between male and female patients.

Dr. Wan highlighted the significance of these results: “While large language models like ChatGPT-4 and LLaVA show clear potential in dermatology, addressing biases across sex and age groups is critical to ensure safety and effectiveness for all patients.”

Next Steps for Improving AI Fairness in Dermatology

The research team plans to expand their analysis by including additional demographic variables such as skin tone. This will provide a more comprehensive evaluation of AI fairness and reliability in clinical applications.

These findings offer essential guidance for developing medical AI systems that are equitable and trustworthy, ensuring they serve diverse patient populations effectively.

For further details, see the original publication: Evaluating Sex and Age Biases in Multimodal Large Language Models for Skin Disease Identification from Dermatoscopic Images, Health Data Science, published 1-Apr-2025.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement