AI predicts protein yield from mRNA, speeding targeted drugs and vaccines

RiboNN predicts translation efficiency from full mRNA sequences across cell types with deep learning. It doubles prior accuracy, speeding vaccine and protein therapy design.

Categorized in: AI News Science and Research
Published on: Sep 15, 2025
AI predicts protein yield from mRNA, speeding targeted drugs and vaccines

AI predicts how mRNA makes proteins - and why that matters for drug design

A new model called RiboNN estimates how efficiently cells translate mRNA into protein across human and mouse cell types. Built by a team at UT Austin and Sanofi, it uses deep learning to forecast translation efficiency from the full mRNA sequence, not just the 5' UTR.

The result: fewer blind experiments, faster iteration on sequence designs, and better odds of getting the right protein made in the right cells.

What RiboNN actually does

RiboNN is a multitask deep convolutional neural network trained on more than 10,000 ribosome profiling experiments, covering 140 cell types and 3,819 datasets. It learns how dinucleotides, trinucleotides, codons, and their positions across the entire mRNA influence translation.

The model encodes biological constraints like ribosomal processivity and tRNA availability. That lets it move beyond 5' UTR heuristics and capture sequence-context effects that typically surface only after rounds of wet-lab testing.

From raw data to a usable model

The team first assembled and corrected a large public dataset of ribosome profiling studies. Undergraduate researchers helped manually review and fix missing or inconsistent annotations. This curated resource, named RiboBase, became the training backbone for RiboNN.

Development was a joint effort between UT Austin and Sanofi, with contributions from academic and industry researchers and support from the NIH and The Welch Foundation. Training ran on UT's Lonestar6 supercomputer. The collaboration also benefited from UT's Discovery to Impact office, which structured the agreement.

Performance and practical impact

Across many cell types, RiboNN typically delivered about twice the accuracy of prior models in predicting translation efficiency. That accuracy makes sequence engineering more predictable for vaccines, protein replacement therapies, and immuno-oncology constructs.

"Cells coordinate which mRNAs they produce and how efficiently they are translated into proteins," said Can Cenik, associate professor of molecular biosciences at UT Austin. "That is the value of curiosity-driven research. It builds the foundation for advances like RiboNN, which only become possible much later."

Why this matters for your pipeline

  • Sequence optimization: Prioritize coding sequences and UTRs that maximize protein yield in a target cell type before synthesis.
  • Targeted expression: Adjust sequences to favor translation in liver, lung, or immune cells to improve on-target biology.
  • Library design: Generate smaller, smarter libraries by focusing on sequence features RiboNN flags as high-TE.
  • Modified bases: Model how base-modified therapeutic RNAs may translate in vivo to balance durability, immunogenicity, and output.
  • Mechanistic insight: Inspect learned features to study ribosome dynamics, codon usage effects, and conserved 5' UTR patterns.
  • Cross-species planning: Compare predictions across human and mouse cell types to reduce translation gaps between models and clinics.
  • QC and risk reduction: Use predictions to de-risk low-yield designs early and allocate bench time to the highest-probability candidates.

How it works under the hood (short version)

The model learns sequence-function relationships directly from ribosome profiling signals. By treating translation efficiency prediction as a multitask problem across many cell contexts, it captures shared structure while adapting to cell-type differences.

Importantly, it reasons over sequence positions across the full transcript. That helps it infer how combinations of motifs affect initiation, elongation, and ribosome movement.

Caveats to keep in mind

  • Distribution shift: Predictions are strongest where the training data is dense. New cell types, delivery systems, or extreme sequence designs still need validation.
  • Context effects: Lipid nanoparticles, dose, and innate immune activation can modulate apparent protein output. Treat sequence as one lever among many.
  • Interpretability: While feature attributions are improving, not every model decision will map cleanly to a known mechanism.

A shared "translation program" across cells

A second study using the same dataset shows that mRNAs with related functions tend to translate at similar levels across cell types. That suggests a coordinated regulatory language linking transcription, stability, localization, and translation.

This coordination helps explain why certain 5' UTR and coding patterns are conserved and offers new angles for therapeutic design that align with cellular priorities.

Where to read more

The research is available in Nature Biotechnology. See the journal's site for details and links to the studies. Nature Biotechnology

Level up your team's AI fluency

If you're building or evaluating sequence-to-function models, upskilling in practical ML will pay off fast. Explore focused learning paths here: AI courses by job

Bottom line

RiboNN sets a higher bar for predicting translation from sequence. For research teams, that means fewer cycles of guesswork, more targeted experiments, and faster progress on mRNA medicines that produce the right protein, in the right place, at the right time.