A Scoping Review and Evidence Gap Analysis of Clinical AI Fairness
The ethical use of artificial intelligence (AI) in healthcare demands careful attention to fairness. AI fairness means both reducing bias in AI systems and using AI to promote equity in healthcare delivery. Despite progress in technical approaches, a clear gap remains between AI fairness research and real-world clinical practice.
This review identifies where these gaps exist by examining healthcare contexts—such as medical specialties, datasets, and bias-related attributes like gender or race—and how AI fairness techniques address bias detection, evaluation, and mitigation. Key findings show limited AI fairness research in many medical fields, a narrow focus on certain bias attributes, a predominant emphasis on group fairness metrics, and little involvement of clinicians in the AI fairness process. To close these gaps, actionable strategies are proposed to speed up the development of fair AI applications in healthcare, supporting more equitable care for all patients.
Why AI Fairness Matters in Healthcare
Fairness in AI is a critical ethical issue, especially in healthcare where decisions directly affect patient outcomes. AI bias occurs when models systematically favor or disadvantage individuals or groups based on attributes such as race, gender, or socioeconomic status. These biases can appear at any stage of AI development—from data collection to algorithm deployment—and risk worsening health disparities.
Health equity means everyone should have the opportunity to reach their full health potential regardless of social barriers. Without addressing fairness, AI risks reinforcing existing inequalities rather than reducing them. Developing fair AI solutions is complex because fairness depends heavily on the specific clinical context and cannot be solved with one universal approach.
Bias Attributes and Medical Contexts
Bias-relevant attributes go beyond the common factors like age, gender, and race. They differ by medical specialty and how bias arises in each context. For example:
- In dermatology, AI models often underperform for darker skin tones due to underrepresentation in training data.
- In liver transplantation, sex differences can cause bias. The MELD score underestimates kidney dysfunction in women because it uses creatinine levels without adjusting for sex differences, affecting transplant eligibility.
Different specialties use different data types, each with unique fairness challenges. Radiology and pathology rely heavily on imaging data, which might seem objective but can contain hidden biases. Mental health depends largely on subjective self-reports and clinician assessments, which introduce additional bias risks.
Measuring Fairness Across Specialties
Fairness metrics vary depending on clinical goals. For example, radiology focuses on equalizing model performance metrics like false positive rates across demographic groups. Liver transplantation emphasizes fairness in terms of medical urgency, ensuring decisions are made based on need rather than socioeconomic factors.
Despite many narrative discussions on AI fairness, there is a lack of comprehensive quantitative analysis linking AI fairness methods directly to clinical demands. This review systematically maps out where research is lacking and how current approaches fall short.
Key Findings from the Evidence Gap Analysis
Medical Fields and Data Types
AI fairness studies are concentrated in a few areas, with limited research in otolaryngology, family medicine, immunology, anesthesiology, hematology, rehabilitation, rheumatology, oral health, and occupational health. The most common data type used is tabular static data, followed by imaging data in specialties like cancer, radiology, and dermatology.
Mental health research showed the most diverse use of data types, including tabular data and text, while dermatology research mainly focused on images. Most studies rely on publicly available datasets, with 241 such datasets identified for use in AI fairness research.
Bias-Relevant Attributes
The most studied attributes are ethnicity/race and gender/sex, followed by age. Socioeconomic status and skin tone also appear but less frequently. Certain attributes are more relevant in specific fields—for example, skin tone in skin cancer studies and location in infectious disease research.
Bias Identification
Out of 467 studies, 267 focused on identifying bias through literature review, exploratory data analysis, and comparing methods. Common approaches include measuring class imbalances and checking for underrepresentation of minority groups. About 200 studies identified bias but did not propose mitigation strategies.
Bias Evaluation Metrics
Group fairness is the dominant framework, focusing on equalizing model performance across subgroups using metrics like equal opportunity and statistical parity. Individual fairness and distributional fairness are less commonly addressed.
Bias Mitigation Approaches
Among studies that attempted to reduce bias, in-process methods—designing inherently fair models—were most common. Pre-processing techniques like resampling and reweighting to balance datasets came next, followed by post-processing methods that adjust model outputs after training.
Emerging Trends in Clinical AI Fairness
Explainable AI (XAI) is increasingly used to uncover bias pathways by identifying which attributes influence predictions. This helps in understanding and mitigating bias more effectively. Clinician involvement, documented in 33 studies, proves vital for improving AI fairness by incorporating domain expertise.
Federated learning has also emerged as a promising in-process bias mitigation strategy, enabling AI models to train on data from multiple institutions without sharing sensitive data, helping to preserve privacy and reduce bias caused by limited datasets.
Terminology to Know
- Health equity: Everyone reaches their full health potential without social barriers.
- Bias: Systematic unfairness disadvantaging certain groups.
- Fairness: The absence of bias to promote equity.
- AI bias: Bias occurring at any stage of AI development.
- AI fairness: Efforts to detect, evaluate, and mitigate AI bias.
- Bias-relevant attributes: Factors linked to bias in decision-making or resource access.
Conclusion
Fairness in clinical AI is critical to delivering equitable healthcare. While research has grown, significant gaps remain in the range of medical fields studied, the diversity of bias attributes considered, and the integration of fairness techniques tailored to clinical needs. Greater involvement of clinicians and adoption of emerging methods like explainable AI and federated learning can help close these gaps.
Addressing these challenges will enable AI tools to support fairer outcomes and reduce health disparities. For healthcare professionals interested in expanding their AI knowledge, specialized courses can provide practical skills to engage with these issues effectively. Explore relevant AI training options at Complete AI Training.
Your membership also unlocks: