Chan Zuckerberg Biohub releases AI tool that predicts structures of 1 billion proteins

Chan Zuckerberg Biohub released a free database of 1.1 billion predicted protein structures today, built with ESMFold2. It more than doubles existing structural data, drawing heavily from soil and ocean genetic samples absent in prior databases.

Categorized in: AI News Science and Research
Published on: May 31, 2026
Chan Zuckerberg Biohub releases AI tool that predicts structures of 1 billion proteins

AI tool predicts structures of 1 billion proteins in newly released atlas

Researchers at the Chan Zuckerberg Initiative's Biohub released an open-source database today containing predicted structures for more than 1 billion proteins. The ESM Atlas dwarfs existing databases by hundreds of millions of entries and was built using ESMFold2, an AI model the team says outperforms Google DeepMind's AlphaFold3.

The atlas includes 1.1 billion predicted protein structures and sequence information for 6.8 billion proteins. Most come from metagenomic sequences - genetic material from soil, ocean, and other environments - that weren't included in previous databases.

How this expands the protein universe

The AlphaFold Database, the previous standard, contains predictions for roughly 200 million proteins. ESMFold2's predecessor atlas held about 800 million entries. The new release more than doubles the available structural data.

Alex Rives, science head at Biohub and lead researcher, said the atlas reveals "the totality of protein biology and especially the parts that are most unknown." The team trained ESMFold2 on billions of proteins across the tree of life, including those metagenomic sequences absent from earlier databases.

Practical applications already demonstrated

Researchers used ESMFold2 to design new antibodies and proteins that bind to targets implicated in cancers and immune disorders. When tested in the lab, a high proportion of the designs functioned as predicted.

The team also identified structural similarities between CRISPR defense proteins and a gene-editing protein found in soil fungi, discovering connections across previously separate areas of protein biology.

ESMFold2 particularly excels at predicting structures of interacting proteins - including antibody-antigen complexes - where it outperforms existing methods.

Reception and remaining questions

Computational biologists see the atlas as a significant resource. Gemma Atkinson at Lund University called it "an extraordinary resource for biology" and noted how large-scale protein language models capture fundamental rules of protein structure.

Christine Orengo at University College London said the predictions could help uncover new protein folds and functions, though they will first need evaluation.

Some researchers raised caution. Martin Steinegger at Seoul National University questioned how well ESMFold2 predicts proteins very different from known structures. His team found the original ESMFold struggled with unusual proteins, particularly those from metagenomic data.

Sergey Ovchinnikov at MIT views the ESM Atlas as supplementary to AlphaFold rather than a replacement. He noted that other proprietary and open-source models have also made gains at predicting protein interactions, though he expects ESMFold2's fully open-source nature and unrestricted commercial use will attract wide adoption.

The atlas is freely accessible. Researchers can explore the AI applications in scientific discovery through structured learning paths designed for professionals in research roles.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)