Mount Sinai's AEquity Cuts Bias in Health AI Data by Up to 96.5%
Mount Sinai's AEquity targets data bias at the source to guide curation and relabeling for consistent results across patient groups. In tests, bias dropped up to 96.5% in imaging.

New AI method reduces bias in health datasets
Mount Sinai researchers introduced an AI method that directly targets data bias-one of the biggest drivers of unfair performance in clinical algorithms. The approach, called AEquity, improves how datasets are built and labeled so downstream models produce more consistent results across patient groups.
The study was published in the Journal of Medical Internet Research and evaluated across images, patient records, and national survey data. For clinical teams, this is a practical way to tighten diagnostic accuracy while reducing inequities at the source: the data.
How AEquity works
AEquity uses a learning-curve approximation to spot where model performance gaps come from and then guides dataset collection or relabeling to close those gaps. In short, it shows you what data you're missing-or what labels need correction-by subgroup and by finding.
This shifts fairness work from late-stage fixes to an upstream, data-first process that can be integrated into routine development and pre-deployment audits.
What the study found
- Chest radiographs: Bias decreased by 29% to 96.5% for each diagnostic finding after AEquity-guided data collection.
- NHANES mortality prediction: Bias fell by up to 80% with AEquity-guided data collection.
- Generalizability: The method held up across different datasets, model types, and intersectional subgroup analyses using standard fairness metrics.
These results point to a straightforward takeaway: better-curated data beats one-size-fits-all fairness patches later in the pipeline.
Why this matters for your organization
Bias in health AI is not a niche issue-it shows up in treatment recommendations, disease detection, and risk prediction. Recent evidence shows some generative models suggest different treatments for the same condition based only on sociodemographic inputs. That creates real risk for inequitable care if left unaddressed.
As health systems adopt more AI, teams need a repeatable process for measuring subgroup performance and closing gaps before tools touch patients. AEquity offers a template for that workflow.
How to put this into practice
- Run subgroup audits early: Track sensitivity, specificity, AUC, calibration, and error rates by sex, age, race/ethnicity, language, payer, and intersectional groups.
- Use data-guided collection: Expand underrepresented subgroups per finding or label, not just overall sample size.
- Tighten labeling QA: Add secondary review where disagreement is high; prioritize relabeling where learning curves show the biggest fairness gains.
- Set explicit fairness thresholds: Define acceptable gaps and require remediation before deployment or scale-up.
- Document decisions: Keep an auditable trail for dataset changes, label policies, and model updates.
- Monitor post-deployment: Recheck subgroup performance as populations shift or workflows change.
Expert perspective
"Tools like AEquity are an important step toward building more equitable AI systems, but they're only part of the solution," said Girish N. Nadkarni, M.D., senior corresponding author and chief AI officer of the Mount Sinai Health System. "If we want these technologies to truly serve all patients, we need to pair technical advances with broader changes in how data is collected, interpreted, and applied in health care. The foundation matters, and it starts with the data."
Policy and oversight
Federal oversight has been limited, so industry groups are moving ahead with guidance and accreditation. The Joint Commission, URAC, and the Coalition for Health AI are building frameworks to support safe, fair deployment. For teams planning evaluations, these resources can anchor internal governance and validation plans.
Where to learn more
Next steps for healthcare leaders
If you are evaluating clinical AI, bake fairness into the data plan first-before model selection. Use learning-curve diagnostics to guide who to recruit, what to relabel, and where confidence is weakest. Build sign-off gates around subgroup performance and keep monitoring after go-live.
Need to upskill your team on responsible AI workflows and audits? Explore practical training by job role at Complete AI Training.