Ethical Cosmetics: AI ends animal testing? A human-data model for skin irritation raises the bar
Researchers from Osmo and the Institute for In Vitro Sciences (IIVS) built an AI model that predicts skin irritation using human-relevant, non-animal data. Trained and validated on reconstructed human epidermis (RHE) assays, the system evaluated 3,000+ chemicals and generated safety insights that would have required up to 19,134 rabbits under legacy methods.
The team describes it as the most accurate skin irritation prediction model to date. More important for science: it shows a credible path to scale safety screening while staying aligned with human biology.
Key takeaways
- AI can cut animal use in skin irritation testing by shifting to validated human-relevant methods.
- Training on non-animal data improves human predictivity and reduces false decisions on safe or risky chemicals.
- Active learning speeds discovery of safer ingredients while minimizing lab runs and cost.
- Osmo and IIVS estimate their work spared more than 19,000 rabbits.
Why human-relevant data matters
Animal skin does not always predict human response. That mismatch has historically removed safe ingredients from consideration and, at times, let risky ones pass. Jacob Sanders, senior machine learning engineer at Osmo, puts it plainly: the goal is to predict human skin outcomes, so the training data should reflect human biology and scale.
By building on validated non-animal assays, the model avoids both translation gaps and throughput limits common in animal testing. That shift improves scientific signal and practical usability for high-volume screening.
The build: RHE assays, active learning, and a tight feedback loop
The system started with Skin Irritation Tests based on reconstructed human epidermis models. These assays were also used in a Gates Foundation-funded project that explored compounds affecting disease-carrying insects, providing a high-quality base to calibrate Osmo's Olfactory Intelligence platform for skin irritation prediction.
Osmo's learning models then selected about 100 molecules whose results would most accelerate performance. IIVS generated validated, ground-truth data with its throughput assay. Results flowed back into the model, forming a learn-test-learn loop that improved accuracy with the fewest possible experiments.
Regulatory context and scientific fit
Skin irritation is a core toxicology endpoint used for labeling, handling, and regulatory submissions. The Draize rabbit test (from 1944) has been a mainstay, but New Approach Methodologies (NAMs), especially RHE-based tests, have become credible alternatives. See OECD's test guideline for in vitro skin irritation (TG 439) for reference.
OECD TG 439: In Vitro Skin Irritation
Predictive AI built on these assays lets teams screen out likely irritants early. As Sanders notes, that means fewer compounds need to be tested on animals or even humans, less wasted lab time, and more attention on chemistries with a real chance to pass.
Validation and adoption
Amanda Ulrey, president of IIVS, emphasizes that once human-relevant methods are validated for a defined use, the scientific case for animal testing weakens. Validation shows a method is reliable and appropriate for the job, increasing confidence across the community.
She points to a broader push: multiple endpoints are moving this way, with AI helping synthesize data and speed replacement when paired with high-quality, human-relevant lab outputs.
What it means for discovery
This model screens pure molecules for skin irritation risk. According to Osmo's CTO, Richard Whitcomb, candidates that pass have a strong chance of being non-irritants when used appropriately in fragrances.
Next step: extend predictions to account for concentration and dilution. Longer term, combining AI with other non-animal screens could enable faster, scalable safety assessments for full formulations across the industry.
Practical steps for R&D teams
- Adopt validated RHE assays and structure your data for ML ingestion from day one.
- Use active learning: prioritize compounds that maximally improve the model, not just the ones that are convenient to test.
- Document the context of use and validation evidence to align with regulatory expectations.
- Track both false negatives (safety risk) and false positives (innovation drag); tune thresholds by use case.
- Pair predictions with mechanistic readouts where possible to aid interpretation and trust.
- Plan for concentration-aware models if your program involves mixtures or finished formulations.
- Partner with labs experienced in NAMs to keep data quality high and turnaround tight.
Organizations and resources
Institute for In Vitro Sciences (IIVS) provides expertise and validated methods for non-animal testing, including RHE-based skin irritation assays.
For researchers and product safety teams building ML fluency to support NAMs and toxicology pipelines, see focused programs here: AI courses by job.
The takeaway is simple: pair high-quality human-relevant data with an active learning loop, and you get faster decisions, fewer animal tests, and safer ingredient pipelines. It's better science and better practice.
Your membership also unlocks: