AI built preterm birth prediction models in minutes - sometimes outperforming expert teams

Generative AI built preterm birth models in weeks, sometimes topping expert teams. It cut months of coding to days, letting researchers focus on study design, checks, and replication.

Categorized in: AI News Science and Research
Published on: Feb 18, 2026
AI built preterm birth prediction models in minutes - sometimes outperforming expert teams

AI built strong preterm birth predictors in weeks - and the workflow could change how biomedical studies get done

Generative AI tools just blitzed through a complex pregnancy dataset and produced accurate preterm birth prediction models faster than traditional teams - sometimes even outperforming them. For researchers, this isn't hype. It's a practical shift in how we build and ship analysis pipelines.

The takeaway: when used well, AI can compress months of coding into days, surface competitive baselines, and let scientists spend more time on study design, interpretation, and replication.

What happened

In a real-world test led by UC San Francisco (UCSF) and Wayne State University, researchers asked multiple teams - some human-only and others using AI - to predict preterm birth from datasets covering more than 1,000 pregnant women. The AI-assisted efforts moved fast and, in some cases, matched or beat expert-built models.

A junior duo - a UCSF master's student, Reuben Sarwal, and a high school student, Victor Tarca - used AI to generate functional code in minutes for tasks that typically take experienced programmers hours or days. The advantage wasn't magic; it was the AI's ability to produce workable analytical code from short, technical prompts.

Not every system delivered. Only 4 of 8 chatbots produced usable code. But those that worked didn't need large expert teams behind them. That efficiency let the group run experiments, verify results, and push a manuscript out within a few months.

"These AI tools could relieve one of the biggest bottlenecks in data science: building our analysis pipelines," said Marina Sirota, PhD, professor of Pediatrics and interim director of UCSF's Bakar Computational Health Sciences Institute. "The speed-up couldn't come sooner for patients who need help now." The study was published on February 17, 2026, in Cell Reports Medicine.

Why preterm birth prediction matters

Preterm birth is the leading cause of newborn death and a major driver of lifelong motor and cognitive disabilities. In the U.S., about 1,000 babies are born prematurely every day, yet we still don't fully know what triggers early labor.

To push the science forward, Sirota's team pooled vaginal microbiome data from around 1,200 pregnant women across nine studies and followed each pregnancy to delivery. "This kind of work is only possible with open data sharing, pooling the experiences of many women and the expertise of many researchers," said Tomiko T. Oskotsky, MD, co-director of the March of Dimes Preterm Birth Data Repository and associate professor at UCSF BCHSI.

The data volume and complexity were the sticking points. So the team turned to a global competition - DREAM Challenges - to crowdsource models. Over 100 groups took on pregnancy-focused challenges, including microbiome-based preterm prediction. Most teams finished within three months, but assembling findings and publishing took nearly two years.

Putting generative AI to the test

To see if AI could compress that cycle, the UCSF group partnered with a team led by Adi L. Tarca, PhD, at Wayne State University. They asked eight AI systems to independently build algorithms for the three DREAM challenge datasets using only natural-language prompts - no human coding.

The goals mirrored the original tasks: use vaginal microbiome data to flag risk of preterm birth and analyze blood or placental samples to estimate gestational age. Pregnancy dating is almost always an estimate, but it guides care; getting it wrong can disrupt decisions around timing and interventions.

The result: 4 of the 8 AI tools produced prediction models that performed on par with, and in some cases better than, the original DREAM submissions. The entire AI project - from concept to journal submission - wrapped in about six months.

Researchers stressed that oversight still matters. AI can produce misleading outputs, and scientific expertise isn't optional. But offloading the grunt work of pipeline building frees teams to ask better questions. "Thanks to generative AI, researchers with a limited background in data science won't always need to form wide collaborations or spend hours debugging code," Tarca said. "They can focus on answering the right biomedical questions."

What this means for your lab

  • Treat LLMs as code co-authors. Use them to scaffold data loaders, baseline models, cross-validation loops, and plotting - then refine and harden. You'll move from idea to runnable code in minutes, not days.
  • Run more experiments per week. Smaller teams can explore more hypotheses, compare modeling choices faster, and reserve expert time for study design, interpretation, and external validation.
  • Expect uneven tool quality. Benchmark multiple models. Keep prompts, generated code, and evaluation outputs under version control for traceability.
  • Protect reproducibility early. Pin dependencies, seeds, and OS/driver versions. Log hyperparameters. Containerize (e.g., Docker/Singularity) and export deterministic builds. Archive prompts and iterations.
  • Guard data privacy. Never paste PHI into public tools. Use self-hosted models or vetted vendors with BAAs. Align with IRB protocols and data-use agreements.
  • Validate like a skeptic. Use nested cross-validation, leakage checks, calibration curves, and stratified subgroup analyses. Favor simple, well-regularized baselines before pushing complexity.
  • Document the human-in-the-loop. Record what the AI proposed, what changed, and why. Many journals require disclosure of AI assistance; plan that section up front.
  • Upskill the team. Juniors can contribute sooner with guardrails and code review. Seniors shift effort to causal reasoning, clinical relevance, and generalization.

Limits and risks to manage

  • Hallucinated or brittle code. Add unit tests, small test datasets, and CI checks. Don't trust unreviewed outputs.
  • Bias and drift. Track performance across ancestry, age, site, and assay batches. Reassess as cohorts evolve.
  • Overfitting on small cohorts. Prefer penalized models and transparent features. Require external validation before claims.
  • Licenses and consent. Align analyses with repository terms and participant consent, especially for secondary use.
  • Authorship and credit. Be explicit about AI-assisted contributions, human oversight, and data provenance.

The bigger signal

This study shows the floor for productive participation in biomedical data science is dropping. With clear prompts, sane evaluation, and tight governance, AI can shorten cycles from hypothesis to manuscript - without replacing the judgment that makes science reliable.

Reference and funding

Cell Reports Medicine, February 17, 2026. UCSF authors: Reuben Sarwal; Claire Dubin; Sanchita Bhattacharya, MS; and Atul Butte, MD, PhD. Other authors: Victor Tarca (Huron High School, Ann Arbor, MI); Nikolas Kalavros and Gustavo Stolovitzky, PhD (New York University); Gaurav Bhatti (Wayne State University); and Roberto Romero, MD, D(Med)Sc (National Institute of Child Health and Human Development (NICHD)).

Funding: March of Dimes Prematurity Research Center at UCSF; ImmPort. Data generation was supported in part by the Pregnancy Research Branch of the NICHD.

Further reading

Related resources


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)