International Study Highlights Sex and Age Biases in AI Skin Disease Diagnosis
An international research team led by Assistant Professor Zhiyu Wan from ShanghaiTech University has uncovered important findings on bias in AI models diagnosing skin diseases. Published in the journal Health Data Science, the study evaluates multimodal large language models (LLMs) such as ChatGPT-4 and LLaVA when analyzing dermatoscopic images.
The researchers assessed approximately 10,000 images focusing on three common skin conditions: melanoma, melanocytic nevi, and benign keratosis-like lesions. The evaluation spanned various sex and age groups to detect potential disparities in model performance.
Key Findings on Model Performance and Fairness
- Both ChatGPT-4 and LLaVA outperformed most traditional deep learning models in diagnosing skin diseases overall.
- ChatGPT-4 demonstrated a higher degree of fairness across different sex and age demographics.
- LLaVA showed significant sex-related biases, indicating uneven diagnostic accuracy between male and female patients.
Dr. Wan highlighted the significance of these results: βWhile large language models like ChatGPT-4 and LLaVA show clear potential in dermatology, addressing biases across sex and age groups is critical to ensure safety and effectiveness for all patients.β
Next Steps for Improving AI Fairness in Dermatology
The research team plans to expand their analysis by including additional demographic variables such as skin tone. This will provide a more comprehensive evaluation of AI fairness and reliability in clinical applications.
These findings offer essential guidance for developing medical AI systems that are equitable and trustworthy, ensuring they serve diverse patient populations effectively.
For further details, see the original publication: Evaluating Sex and Age Biases in Multimodal Large Language Models for Skin Disease Identification from Dermatoscopic Images, Health Data Science, published 1-Apr-2025.
Your membership also unlocks: