GIST AI flags missing steps in clinical trial reports
Randomized, controlled trials set the standard for testing safety and efficacy. Yet too many papers gloss over essential details, making it hard to judge study quality or reproduce results. A team at the University of Illinois Urbana-Champaign trained an AI system on PSC's Bridges-2 to spot missing steps and help authors and journals fix reporting gaps before publication.
The goal: an open-source tool scientists can run on a laptop, and journals can add to their editorial workflow, to raise the bar on clinical trial reporting.
Why reporting falls short
Good trials start with true randomization and clear, predeclared outcomes. Without those, bias creeps in, and results become harder to trust. Even when teams follow best practices, they often leave out key details in their write-ups-and there are far more trial papers than any review team can check by hand.
"Clinical trials are considered the best type of evidence for clinical care ... But there are a lot of problems with the publications of clinical trials. They often don't have enough details. They're not transparent about what exactly has been done and how, so we have trouble assessing how rigorous their evidence is," says Halil Kilicoglu, associate professor of information sciences at Illinois.
Ground truth: CONSORT and SPIRIT
The team anchored the work on established reporting guidance: 83 checklist items from the CONSORT 2010 Statement and SPIRIT 2013 Statement. These items define what a complete, transparent trial protocol and results paper should include. See the CONSORT guidance here: consort-statement.org.
How PSC's Bridges-2 made it possible
Bridges-2 delivered the compute and software stack their students needed to iterate fast. GPUs were key for training Transformer-based NLP models on a curated set of 200 trial articles (2011-2022). The team accessed Bridges-2 through the NSF's ACCESS program, avoiding costly hardware overhead and setup delays.
"We are developing deep learning models. And these require GPUs ... When you sign up for Bridges you get the GPUs ... and all the software that you need is generally installed. It's easy to get students going on Bridges-2," Kilicoglu says.
What they built-and how they tested it
- Task: Detect whether each CONSORT/SPIRIT checklist item is present, at both sentence and article level.
- Data: A portion of papers labeled for training; the rest held out for testing.
- Learning: The models adjusted internal weights to match labeled patterns, improving until additional training no longer helped.
- Metric: F1 score, balancing precision and recall (1 is perfect; 0 is worst).
The best model reached an F1 of 0.742 at the sentence level and 0.865 at the article level. Results appeared in Scientific Data (Feb 2025): doi.org/10.1038/s41597-025-04629-1.
What's next
- Scale up training data beyond the initial 200 papers to tighten performance.
- Use knowledge distillation so a large model trained on Bridges-2 can teach a smaller model that runs locally.
- Release tools openly so authors, reviewers, and journals can run checks before and during peer review.
Why this matters for your work
- For investigators: Run a pre-submission check to catch missing items early-randomization details, allocation concealment, prespecified outcomes, and more.
- For methodologists: Get consistent signals on reporting quality to prioritize deeper audit and replication.
- For journals: Add an automated checklist step to reduce back-and-forth and improve transparency.
- For patients: Better reporting supports better evidence, which leads to better decisions in care.
Clinical trials will always demand rigor at the bench and bedside. With GIST AI assisting on the reporting front, the field can move faster and with more confidence-without adding more load to already stretched teams.
Your membership also unlocks: