Materials Project hits 650,000 users and 32,000 citations, driving AI-ready discovery in materials science

Materials Project speeds materials discovery-650k+ users, 32k+ citations, ML-ready datasets. High-quality data, 99.98% uptime, and tools to move research faster.

Categorized in: AI News Science and Research
Published on: Jan 14, 2026
Materials Project hits 650,000 users and 32,000 citations, driving AI-ready discovery in materials science

The Materials Project: The Data Engine Behind Faster Materials Discovery

  • Most-cited resource in materials science: The Materials Project's data and tools have been cited 32,000+ times.
  • Real impact, daily: 5,000 uses per day by a community of 650,000+ registered users.
  • AI-ready datasets: Curated, standardized data that shortens time-to-insight for ML-driven research.

Launched at Berkeley Lab in 2011, the Materials Project started as a simple idea: make high-fidelity materials properties accessible to everyone, no coding required. Today, it's the go-to database for researchers who want speed, scale, and credibility in materials R&D.

The platform now serves 650,000+ users and has crossed 32,000 citations in peer-reviewed literature. The growth signals a clear shift: researchers want clean, ML-ready data they can plug directly into workflows without spending months on preprocessing.

What You Get: Scale, Quality, and Speed

The database covers 200,000+ materials and 577,000+ molecules, with properties computed using advanced methods and validated against experiments. Over the last two years, it delivered 465 TB of data to the community - enough to keep even the most data-hungry pipelines fed.

For ML, the value is in the curation. You get standardized formats, electron density data, and benchmark-ready sets so you can train, validate, and deploy models with confidence. That means less time wrangling data and more time building predictive tools and testing hypotheses.

Built for High Availability and Heavy Use

During lab shutdowns, researchers leaned on the platform to keep projects moving. Usage has surged 2.5x since May 2022, and the system now runs on a modern cloud stack with MongoDB, Datadog, and AWS to support everything from rapid searches to massive downloads.

Uptime sits at 99.98%, so your queries, APIs, and interactive tools are there when you need them.

From Screening to Real Materials

The Materials Project's high-throughput modeling at the National Energy Research Scientific Computing Center (NERSC) enables rapid screening of large materials libraries for target properties. Results are grounded in computation and checked against experiments to guide synthesis efforts.

This approach has supported work across batteries, semiconductors, microelectronics, and catalysts - giving teams a way to prioritize what to make next with fewer dead ends.

Adoption Across Industry and Academia

Toyota Research Institute has long relied on the platform's open-source tools and data. As Brian Storey, TRI Vice President, put it: "The Materials Project serves as a strong bridge between industry and academia by providing the entire research community with transparently developed open-source tools."

Microsoft used the database to train models for generative materials design (MatterGen) and to help develop a new battery electrolyte through Azure Quantum. In 2020, a collaboration among UC Santa Barbara, Argonne, and Berkeley Lab identified and synthesized Mn1+xSb via MP-guided screening - a magnetic compound with promise for thermal cooling in electronics and other applications.

Community Data In, Better Science Out

Through MPContribs, labs, universities, and companies can contribute large datasets directly to the platform. This has expanded coverage and improved property diversity across the board.

Google DeepMind trained its GNoME models with MP data and contributed nearly 400,000 new compounds to the database, significantly broadening the design space for future studies.

Open Science at Scale

The Materials Project leads in datasets registered with the DOE's Office of Scientific and Technical Information (OSTI), supporting findability through services like Google Dataset Search. It's also one of seven DOE Office of Science PuRe Data Resources, a marker of high standards in data management and public reuse.

Practical Ways to Use It This Quarter

  • Screen candidate materials for specific target properties and down-select before you synthesize.
  • Train and benchmark ML models using curated datasets with consistent formatting and electron density data.
  • Pull large downloads to seed internal databases and build active-learning loops.
  • Share your lab's datasets via MPContribs to increase visibility and accelerate community validation.

Where It's Going Next

The team is advancing computational methods and support for complex behaviors, with tighter integration between simulation and experiment. A key push connects MP's pipeline to Berkeley Lab's A-Lab, an autonomous facility that uses robots and AI to synthesize materials - closing the loop from prediction to realization.

The goal is simple: shorten discovery timelines and make higher-confidence decisions at each stage of research.

Details That Matter

  • 650,000+ registered users; 5,000 uses per day
  • 32,000+ citations in peer-reviewed studies
  • 200,000+ materials and 577,000+ molecules
  • 465 TB delivered in the last two years
  • 99.98% uptime on a cloud-native stack

If you're ready to plug into the data backbone many teams are building on, explore the Materials Project and get your next study moving faster.

Visit the Materials Project
Learn about NERSC

Upskill Your Team for ML-Driven Materials Work

Building ML capability across your group? Explore curated AI courses to accelerate adoption and improve model quality.

See courses by job role


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide