AI for Science: where it has reached - and what researchers can do next
AI has moved from hype to hard results in core scientific domains. Biology leads, pulled forward by abundant data, urgent clinical needs, and clear validation cycles. A new research stack is emerging: foundation model + research agent + autonomous lab. This isn't a slogan - it's shipping in real labs and real pipelines.
Google DeepMind's stack: from structure to design, from simulation to discovery
Biology: AlphaFold solved protein structure prediction at scale and set the stage for generative biology. With AlphaProteo and AlphaMissense, the chain from target discovery to structure analysis to drug design is getting tighter and faster. The focus is shifting from "What is the structure?" to "What should we build next, and why?"
Meteorology: WeatherNext 2 (successor to GraphCast) uses data-driven forecasting to outperform the ECMWF HRES system across 99.9% of variables and lead times, with inference speed improved by orders of magnitude. That means earlier, more precise warnings and cheaper compute. For context on HRES, see the ECMWF overview here.
Materials and physics: GNoME has predicted millions of stable inorganic crystal structures - a library several times larger than historical experimental discoveries. AlphaQubit applies Transformer methods to quantum error correction and cuts qubit readout errors on chips.
Math and computing: AlphaEvolve applies evolutionary search to discover new ML algorithms and losses - stepping beyond hand-designed ideas. Inside Google, spinoffs like AlphaChip improved TPU v6 layouts; AlphaGeometry and AlphaProof pushed formal reasoning and proof.
For a primer on AlphaFold's broader impact, DeepMind's overview is useful here.
Biology is executing: basic research and clinical translation
Single-cell intelligence: Google and Yale released C2S-Scale (27B parameters) for single-cell analysis. It generated testable hypotheses on cancer cell behavior that were validated in vitro - a concrete example of AI proposing ideas, not just summarizing data.
Protein dynamics and engineering: Microsoft's BioEmu accelerates protein dynamics simulation by up to 100,000×. A team from the Chinese Academy of Sciences introduced an inverse folding model blending structure and evolutionary constraints, pointing to more controllable protein design.
Genomics pipeline: After a decade of work, Google's stack spans sequencing, variant calling, expression prediction, pathogenicity scoring, and downstream detection/diagnosis - a cohesive path from raw reads to clinical relevance.
Clinical momentum: DeepGEM (Tencent Life Science Lab with partners) predicts lung cancer gene mutations from routine slides in about a minute with 78-99% accuracy, expanding pathology use cases. Google's DeepSomatic improves somatic mutation calling across cancers like leukemia, breast, and lung. An AI-optimized candidate drug (MTS-004) has completed Phase III in China, addressing a long-standing barrier where many AI-driven programs stalled at Phase II.
Materials, meteorology, physics, and math: acceleration continues
Materials discovery: New companies are forming around automated search and validation - Periodic Labs (superconductors), CuspAI (carbon-capture materials), and RhinoWise (energy and semiconductor materials). The playbook is consistent: large model priors, active learning, and closed-loop robotics.
Weather and fusion: DeepMind's hurricane model supported early path and intensity warnings for storms like "Melissa." In fusion, CFS leverages Google's open-source TORAX for SPARC development, showing how shared tools shorten iteration cycles.
Math and algorithms: Researchers are using next-gen language models to probe long-standing problems (e.g., Erdős-type questions). NVIDIA's open-source GenCluster took gold at IOI 2025, and labs report steady gains on Olympiad-style benchmarks with dedicated math models.
The new scientific workflow: model, agent, lab
1) General foundation models as the "OS" for research: Modern LLMs boost literature triage, method planning, code, and analysis. Systems like Claude Sonnet 4.5 show clearer process-following in life-science tasks and connect to external tools more reliably - critical for reproducibility.
2) Domain-specialized models as engines of depth: In biology, chemistry, climate, materials, and math, specialized models encode domain rules and priors. Examples include C2S-Scale, BioEmu, and DeepGEM. The Panshi scientific model from CAS blends a general backbone with domain heads - a practical pattern for institutes.
3) Research agents for proactive discovery: Agents can plan experiments, write and run code, call domain tools, and iterate without hand-holding. Harvard and MIT's ToolUniverse aggregates 600+ scientific tools for agent use. Google's AlphaEvolve already optimizes real workloads like chip design and data-center scheduling. Teams in Shanghai and Zhejiang introduced "Agentic Science" to define full AI-driven research loops.
4) Autonomous labs for scale and speed: Robotics plus AI shifts labs from manual trial-and-error to high-throughput, closed-loop "science factories." Universities and national labs (MIT, University of Liverpool's MIF) and groups like IKTOS, Atinary SDLabs, and FULL-MAP are operating credible platforms. Startups such as Lila Sciences and Periodic Labs are raising significant funding to build at industrial scale.
Infrastructure is converging
Policy and funding are now aiming at an integrated stack: compute, models, datasets, and autonomous labs on a common platform. Recent US initiatives prioritize advanced manufacturing and next-gen research infrastructure to shorten time-to-insight and time-to-impact. In China, efforts from Jingtai Technology (AI + robotics), CAS's ChemBrain Agent + ChemBody Robot, and the Beijing Academy of Scientific Intelligence's Uni-Lab-OS show a coordinated push to domestic platforms. CAS's Panshi platform manages data, models, and toolchains across life science, high-energy physics, and mechanics.
AI for science. Science for humanity.
Expect faster iteration in the next few years as foundation models scale and lab automation matures. The research paradigm will stabilize around the model-agent-lab loop. Some leaders forecast discoveries on the scale of 20th-century theories before decade's end.
The boundary condition remains the same: scientists set direction, define standards, and keep ethics front and center. AI should compress the distance from question to answer - and broaden who can participate - without sidelining scientific judgment.
What to do now: a practical playbook for labs and institutes
- Audit your data: identify high-signal datasets, label gaps, privacy constraints, and where synthetic data could help.
- Stand up a dual-stack: one general LLM for orchestration and notes; a small set of domain models for core tasks (e.g., structures, dynamics, genomics, materials).
- Build agentic workflows: literature triage → hypothesis generation → code + tool calls → result critique → next-step planning. Start with 1-2 high-value SOPs.
- Close the loop: connect agents to automated instruments where feasible (pipetting, synthesis, characterization). Even partial automation compounds ROI.
- Measure what matters: pick 3-5 metrics tied to scientific value (e.g., hits per iteration, prediction-to-validation lag, cost per validated lead).
- Governance and traceability: log prompts, code, parameters, data lineage, and decisions. Reproducibility is your shield and your flywheel.
- Upskill the team: short, recurring training beats one-off bootcamps. Pair scientists with ML engineers for weekly method sprints.
If you're setting up training tracks by role or domain, this catalog can help you find focused options: AI courses by job.
Your membership also unlocks: