AI advances drug discovery but data gaps limit what it can achieve, researcher says

AI has doubled the number of druggable proteins researchers can target, but algorithms can't design drugs for the remaining 75% of the human proteome without experimental data to learn from. Better AI isn't the bottleneck-more data is.

Categorized in: AI News Science and Research
Published on: Apr 22, 2026
AI advances drug discovery but data gaps limit what it can achieve, researcher says

AI Isn't Solving Drug Discovery Yet-But It's Getting Closer

Artificial intelligence is reshaping how scientists discover new drugs, but headlines suggesting AI has replaced human researchers are wrong. The field is approaching a ceiling that smarter algorithms alone cannot break.

Machine learning works on two requirements: large, well-organized datasets and a human-designed training framework. Drug discovery has both, which is why it stands to benefit more from AI than most fields. But that advantage only goes so far.

What AlphaFold Actually Did

The 2024 Nobel Prize for Chemistry went to the AlphaFold team for predicting protein structures computationally. Two years later, media still misrepresents what they achieved.

AlphaFold did not solve the protein folding problem. It cannot predict the structure of c-Myc, one of cancer's most important oncogenes. What it did accomplish was identifying protein structures similar to ones already known-something previous technologies could not do.

The practical result: using AlphaFold 2 models doubled the number of druggable proteins available to researchers. That matters. But it's not magic.

This capability exists because of decades of experimental work. The Protein Data Bank, founded in 1971 with seven structures, now contains nearly 250,000 structures representing over 750,000 distinct protein snapshots. Generations of computational scientists analyzed these data for patterns. AlphaFold trained on that foundation.

Where the Ceiling Hits

Drug discovery research has tested compounds against only one quarter of the human proteome. AI algorithms cannot identify novel chemistry needed for the remaining three quarters without experimental data showing what works.

The problem is straightforward: AI learns from existing data. It cannot extrapolate far beyond it. Asking an algorithm to design drugs for untested proteins is like asking image-generation software to accurately render life forms from an exoplanet based only on Earth photographs.

Breaking through requires more experimental data. Not better AI. Data.

What Comes Next

Investment in key data generation is essential. This means running more experiments, cataloging results systematically, and making that information available to researchers and algorithms.

Scientists must also define what each algorithm can and cannot do. Overstating capabilities erodes trust and leads to poor decisions about where to invest resources.

The scientific method should guide AI adoption in drug discovery, not hype. The field has the infrastructure to benefit from AI more than most. Maintaining that advantage depends on grounding claims in what the data actually shows.

Learn more about AI for Science & Research and how these tools apply to your work.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)