AI

UTA's Xinlei Wang pairs AI with Bayesian models to turn CyTOF and RNA-seq into clear disease insights in seconds. Open-source tools run on standard lab machines.

Categorized in: AI News Science and Research
Published on: Oct 24, 2025
AI

New AI tools help scientists track how diseases start

Thursday, Oct 23, 2025

Artificial intelligence moves fast. The real win for labs comes when AI is paired with clear statistical modeling that scientists can trust and explain.

At The University of Texas at Arlington, Xinlei (Sherry) Wang, Jenkins Garrett Professor of Statistics and Data Science in the Department of Mathematics and founding director for research in the Division of Data Science, is leading a four-year, $1.28 million federal effort to make complex single-cell data straightforward and actionable for disease research.

What Wang's team is building

The project, "Statistical and Deep Generative Modeling for Enhanced CyTOF Data Interpretation and Discovery," centers on a single Bayesian model that reads high-dimensional CyTOF data and returns interpretable results fast. The team is building a one-stop set of bioinformatic and statistical tools to help researchers go from raw files to insights with fewer handoffs and fewer ad hoc scripts.

In practice, that means a transparent model that explains how the data were generated and which parameters matter. Those parameters align with domain concepts (for example, higher protein expression in disease vs. control), so results are easier to validate and share across teams.

Why CyTOF data is hard-and how the model helps

CyTOF can scan thousands of cells at once and measure dozens of proteins per cell. It's a gold mine for immunology and oncology, but it produces data that quickly overwhelms standard pipelines.

Wang's group tackles this by modeling the data-generating process and quantifying uncertainty. The output highlights structure you care about-cell types, shifts between healthy and disease states, and protein markers that drive those differences-without turning the analysis into a black box.

Mass cytometry (CyTOF) primer

Speed, scale, and interpretability

"Without AI integrated into our Bayesian framework, you couldn't scale and it would take several days or even longer to get results," Wang said. "With AI, you get reliable, rigorous results within seconds, even for millions of cells. We model what we know about the data using transparent Bayesian models so the parameters are interpretable. For example, a parameter might indicate increased protein expression in the disease group compared to the control group."

The approach pairs deep generative models with Bayesian learning to keep both speed and statistical rigor. That combination is critical when you need to process millions of cells and still trust the inference.

An integrated single-cell view

The algorithms bring together single-cell transcriptomics (next-generation RNA sequencing) and CyTOF (single-cell protein profiling). That fusion gives researchers a clearer view of what's happening inside each cell across modalities-useful for finding rare cell states, tracking immune responses, and flagging drug-sensitive subpopulations.

The system can process millions of cells, each with 40-100 protein measurements or tens of thousands of gene expressions, then identify cell types and compare healthy versus diseased cells at scale. This is the type of throughput bench-to-bedside studies require.

Single-cell RNA-seq overview

Early results and collaborators

Momentum is building. Kevin Wang, now a tenure-track assistant professor at Davidson University, earned the Best PhD Poster Award at the 2025 Conference of Texas Statisticians for presenting the team's preliminary results.

In parallel, a Nature Communications study co-authored by Wang, postdoctoral researcher Zeyu Lu, and colleague Lin Xu introduced BIT (Bayesian Identification of Transcriptional Regulators from Epigenomics-Based Query Regions Sets). BIT improves accuracy in gene regulation research by linking regulatory regions to plausible transcriptional drivers.

Additional collaborators include UTA's Division of Data Science; Li Wang (Mathematics); Yike Shen (Earth and Environmental Sciences); and UT Southwestern researchers Yuqiu Yang and Andy Xiao.

Software you can actually use

"AI is powerful, but it's often a black box," Wang said. "We are designing user-friendly, open-source software so end users can run it on their laptops. Existing algorithms can't handle big data this efficiently. We combine statistical rigor, uncertainty quantification, and scalability-all in one framework."

What this means for research teams

  • Shorter time-to-result: Move from multi-day runs to seconds on large datasets.
  • Interpretable outputs: Parameters map to biological signals you can test and publish.
  • Multi-omics by default: Joint analysis of CyTOF and scRNA-seq reduces pipeline fragmentation.
  • Reproducibility: A single Bayesian framework with uncertainty estimates supports stronger claims.
  • Practical deployment: Open-source tools designed to run on standard lab hardware.

About The University of Texas at Arlington (UTA)

Celebrating its 130th anniversary in 2025, UTA is a growing public research university in the Dallas-Fort Worth metroplex. With more than 42,700 students and 180+ degree programs, UTA is a Carnegie R-1 institution and the second-largest in the UT System.

UTA and its 280,000 alumni generate an annual economic impact of $28.8 billion for Texas. The University holds the Innovation and Economic Prosperity designation and is recognized for its focus on student access and success.

Want to upskill your team in AI for data-intensive research?

Explore curated AI learning paths for scientists and data teams at Complete AI Training: Courses by Job, or browse the latest AI courses to keep your lab current.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)