Biohub commits $500 million to build open biological data foundation for AI cell models

Biohub is committing $500 million over five years to build open datasets for AI-based cell modeling. The effort involves the Allen, Arc, Broad, and Wellcome Sanger institutes, with all data made freely available.

Categorized in: AI News Science and Research
Published on: Apr 29, 2026
Biohub commits $500 million to build open biological data foundation for AI cell models

Biohub commits $500 million to build open data foundation for AI-powered biology

Biohub announced the Virtual Biology Initiative on April 29, a five-year effort to generate the massive datasets needed to build predictive models of human cells. The organization is committing $500 million to the project: $100 million to coordinate global data generation across institutions, and $400 million to develop new technologies for measuring and imaging biology.

The initiative addresses a fundamental constraint in AI for biology. Scientists can now envision accurate predictive models of cells, but building them requires orders of magnitude more data than currently exists. No single institution can generate this volume alone.

Who's involved

Biohub is partnering with the Allen Institute, Arc Institute, Broad Institute, and Wellcome Sanger Institute. The Human Cell Atlas and Human Protein Atlas consortia are also participating. NVIDIA will provide computing infrastructure and software. Renaissance Philanthropy is helping expand funding beyond Biohub's commitment.

These organizations have committed to coordinating their data-generation efforts and making all data freely available to the global research community.

What the data enables

Predictive models of cells could reveal how diseases develop and how to reverse them. Researchers could test thousands of hypotheses digitally instead of running experiments in the lab, accelerating the path from discovery to treatment.

The initiative builds on work Biohub has already funded, including the Human Cell Atlas, the Billion Cells Project (launched in 2025 across 17 leading institutions), and the Tabula Sapiens multi-organ cell atlas.

The technology investment

Biohub's $400 million internal commitment will develop next-generation imaging tools, including cryo-electron tomography to resolve atomic-level details in cells and microscopy systems to observe billions of cells in living tissues. The organization is also building molecular and tissue engineering technologies to enable new measurements and experiments.

All data generated by Biohub will be made openly available.

Why this matters for researchers

A dataset of this scale will contain answers to fundamental questions about cellular behavior and disease. Researchers working in genomics, proteomics, transcriptomics, and cell biology will have access to shared resources built on coordinated global effort-similar to how the Human Genome Project and Protein Data Bank became foundational infrastructure for their fields.

The effort explicitly invites additional funders and research institutions to participate. The organizations involved acknowledge that reaching the necessary scale requires contributions beyond what this initial group can provide.

For researchers focused on AI for Science & Research, understanding how large-scale data infrastructure supports model development is increasingly central to the work.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)