Biohub launches $100 million push to build AI-ready biology database
A consortium of research institutions is committing $100 million to generate cellular data that AI models need to simulate human biology and accelerate drug discovery. The Virtual Biology Initiative, led by Biohub, aims to create a public database of cell types, behaviors, and states across the human body-foundational work that has never been completed at scale.
The effort brings together the Allen Institute, Arc Institute, Broad Institute, Wellcome Sanger Institute, and consortia including the Human Cell Atlas and Human Protein Atlas. NVIDIA and Renaissance Philanthropy are also partnering on the initiative.
The data gap blocking progress
AI models trained on protein and genomic databases have already shown they can design new proteins to target cancer cells and stop pathogens. But these models work only because they were trained on massive datasets. Cellular biology lacks an equivalent resource.
The vast majority of cellular activity has never been observed or measured. Before AI can simulate biology at the scale needed to model entire organs or tissues, researchers need to see what they're trying to predict.
"The scientific community will need to collaborate on an unprecedented scale," according to Biohub leadership.
What the data enables
AI models capable of simulating the immune system could allow researchers to engineer therapies that prevent cancer, neurodegeneration, and metabolic disorders at early stages. Current scientific methods reduce questions to their simplest form, stripping away the complexity that matters in actual human bodies. AI models face no such constraints.
A comprehensive cellular database would let researchers address questions that traditional laboratory methods cannot.
Measurement technology investments
Biohub is committing an additional $400 million to develop tools that can observe cells at scale. The roadmap includes advanced microscopy capable of tracking millions to billions of cells in living organisms, and cryo-electron tomography that resolves atomic-level detail within cells.
Cell and tissue engineering capabilities are also priorities, enabling researchers to run experiments and measure biological processes currently inaccessible.
Building on existing work
Universities and research institutes worldwide have already generated large-scale cellular datasets over the past decade. Last year, the Billion Cells Project network launched to create a massive open-source biological dataset. The Virtual Biology Initiative builds on these foundations while coordinating effort across institutions.
The consortium is calling on researchers and organizations with resources to contribute to the effort. The speed of progress depends on how quickly the scientific community can assemble the data.
Learn more about AI for Science & Research and AI for Healthcare.
Your membership also unlocks: