AI finds the holes in clinical trial reporting so patients don't fall through

UIUC team trained AI to flag missing CONSORT/SPIRIT items in trial papers, with strong F1 scores. An open-source tool could help authors and journals catch missing items early.

Categorized in: AI News Science and Research

Published on: Oct 23, 2025

AI checks clinical trial reports for missing steps

Randomized, controlled trials set the standard for evidence in medicine. Yet too many published studies skip key reporting details, making it hard to judge quality and reproduce results. A University of Illinois Urbana-Champaign team trained AI models on PSC's NSF-funded Bridges-2 system to flag missing elements in trial reports based on established guidelines. Their goal: an open-source tool that authors and journals can use to plan, conduct, and report trials with fewer gaps.

Why this matters for researchers and editors

Random assignment and pre-specified outcomes reduce bias. But even when researchers follow best practices, those steps don't always make it into the paper. With thousands of trials published each year, manual checks for reporting completeness don't scale.

"Clinical trials are considered the best type of evidence for clinical care. If a drug is going to be used for a disease … it needs to be shown that it's safe and it's effective … But there are a lot of problems with the publications of clinical trials. They often don't have enough details. They're not transparent about what exactly has been done and how, so we have trouble assessing how rigorous their evidence is." - Halil Kilicoglu, University of Illinois Urbana-Champaign

How the team built the checker

The team grounded their work in the CONSORT 2010 and SPIRIT 2013 reporting guidelines, which together outline 83 key items for high-quality trials. They fine-tuned natural language processing models (Transformer-based) to detect whether papers reported those items.

Bridges-2 provided the GPU resources and ready-to-use software stack needed to train on large text datasets. The models learned from 200 randomized trial articles published between 2011 and 2022, with a portion labeled for training and the remainder held out for testing.

"We are developing deep learning models. And these require GPUs, graphical processing units. And you know, they are … expensive to maintain … When you sign up for Bridges you get … the GPUs, and that's useful. But also, all the software that you need is generally installed. And mostly my students are doing this work, and … it's easy to get [them] going on [Bridges-2]." - Halil Kilicoglu, University of Illinois Urbana-Champaign

Training, testing, and results

The models were trained to map language patterns to specific checklist items. Performance was scored with F1, balancing precision and recall. The best models reached F1 scores of 0.742 at the sentence level and 0.865 at the article level.

The work was published in Nature Scientific Data in February 2025, indicating that AI can reliably screen trial reports for missing steps at scale.

What's next: better models and wider access

The team plans to improve performance with more training data and model distillation. The idea is to let a large model teach a smaller one that runs on a laptop or desktop.

The endgame is an open-source tool. Authors could pre-check manuscripts before submission. Journals could add an automated screening step and send papers back for fixes when items are missing.

Practical takeaways for your team

Write to the checklist. Map each section of your manuscript to CONSORT/SPIRIT items, including randomization, allocation concealment, pre-specified outcomes, and analysis plans.
Pre-register and pre-specify. Make your objectives and success criteria explicit up front, and report any deviations clearly.
Adopt AI pre-checks once available. Use them as a first pass to catch omissions before peer review.
If you're building similar tools, expect to fine-tune Transformer models on labeled sentences and sections, evaluate with F1, and budget for GPUs or HPC allocations (e.g., via NSF ACCESS).

Resources: CONSORT Statement and SPIRIT Statement.

If your group is upskilling on AI for research workflows, you may find these helpful: AI courses by job.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

AI finds the holes in clinical trial reporting so patients don't fall through

AI checks clinical trial reports for missing steps

Why this matters for researchers and editors

How the team built the checker

Training, testing, and results

What's next: better models and wider access

Practical takeaways for your team

Related AI News for Science and Research

AI finds the holes in clinical trial reporting so patients don't fall through

Can AI be good for the climate? 5 ways it cuts emissions despite its energy use

From Reimbursement to Cloud Strategy: JMIR Seeks Evidence on AI Uncertainty in Digital Health

Why AI pioneers are demanding a global ban on superintelligence

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: