AI model fills in missing hydrogen atoms in crystal structures

An AI model predicts hydrogen atom positions in inorganic crystal structures with 85% accuracy. It fills gaps left by X-ray scans to accelerate materials design.

Published on: Jun 19, 2026
AI model fills in missing hydrogen atoms in crystal structures

An AI model developed at the Paul Scherrer Institute (PSI) can now predict the positions of hydrogen atoms in the crystal structures of inorganic materials. The tool, built on Microsoft's MatterGen, fills in missing or incorrectly placed atoms that are often invisible to standard x-ray techniques, potentially speeding up materials simulation and the design of novel compounds like superconductors.

Hydrogen atoms scatter x-rays so weakly that they routinely go undetected in powder diffraction experiments, leaving crystal structures incomplete. Neutron or synchrotron sources can solve the problem, but they require expensive, large-scale facilities and more sample material. The result is a persistent gap in structural databases that hampers computational modelling of everything from thermal conductivity to vibrational spectra.

"I remember once I wanted to compare the results of our predictions of the crystal structure of cellulose with experiments," says Artem Oganov, a materials scientist at the Skolkovo Institute of Science and Technology who was not involved in the work. Despite cellulose being the most abundant polymer on Earth, the experimental structure still had missing hydrogen atoms.

Filling the gaps like an image editor

"If we don't know where the atoms sit, there's no way for us to simulate the material," says Timo Reents, a PhD student at the PSI Centre for Scientific Computing, Theory and Data. The team treated the problem as a kind of inpainting task. "We know the [positions of] heavy atoms, we know the unit cell shape," Reents says, "and we want to use this host structure [to] predict the hydrogen positions, which is kind of this missing part in the image." The generative model adds noise to the unknown positions and then refines the structure until it reaches the lowest energy configuration.

The model was trained on an inorganic structural database by artificially removing hydrogen atoms from known crystal structures, including over 800 DFT-generated structures with up to 20 atoms per unit cell. When the team expanded the dataset to materials with up to 40 atoms per unit cell and tested on thousands of structures, the model reproduced the correct crystal structure in about 85% of cases. In a further 12% it predicted structures that were more stable than the original.

A hydrogen-agnostic approach

The model is publicly available and, according to Reents, "hydrogen agnostic" - it can be retrained to predict the positions of other light atoms such as lithium or sodium. Pierre-Paul De Breuck, a computational material scientist at Ruhr-Universität Bochum, sees immediate practical value. "Crystallographers can [also] use it as a fast, physically grounded starting point for refining ambiguous x-ray structures, instead of relying on 'chemically sensible' guesses," he says.

De Breuck cautions that the DFT simulations used for training are usually performed at 0 K, while real experiments happen at finite temperature, so lattice parameters and atomic positions may differ slightly. Still, for Oganov, the work addresses a long-standing bottleneck. "Hydrogen has the same rights as other atoms," he says. The details appear in a npj Computational Materials paper.

Why this matters for science and IT professionals

The model's availability as a public tool underscores the growing role of AI for Science & Research in solving persistent experimental challenges. For materials scientists and chemists, it offers a fast, physics-based way to correct incomplete crystal structures without relying on chemical intuition. For IT and development professionals, the approach demonstrates how generative AI - trained on domain-specific data - can fill gaps in structured scientific datasets, a pattern that extends to many other fields. The model's code is available for anyone to test or adapt, lowering the barrier to entry for teams that want to build similar refinement tools into their own workflows.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)