AI Predicts 1,000 Diseases Up to 20 Years Before Symptoms

Delphi-2M forecasts risk for 1,000+ diseases up to 20 years, trained on biobank and Danish records. These well-calibrated predictions could guide screening, trials and planning.

Categorized in: AI News Science and Research
Published on: Oct 26, 2025
AI Predicts 1,000 Diseases Up to 20 Years Before Symptoms

Delphi-2M: An AI system that forecasts disease risk 10-20 years ahead

A research team spanning the German Cancer Research Center, the University of Copenhagen, and the European Molecular Biology Laboratory has introduced Delphi-2M, a transformer-based model trained on over 40,000 Biobank participants and 1.9 million Danish medical records. The system predicts individual risk trajectories for more than 1,000 diseases-far beyond single-condition models.

Published in Nature, the work suggests meaningful calibration across time horizons. As Professor Ewan Birney (EMBL) put it: "If our model says it's a one-in-10 risk for the next year, it really does seem like it turns out to be one in 10."

What's new here

  • Sequence-aware modeling: The transformer architecture learns temporal dependencies across medical events, capturing how conditions and risk factors unfold over time.
  • Multi-disease prediction at scale: Risk estimates for 1,000+ conditions, including type 2 diabetes and heart attacks, rather than training a separate model for each diagnosis.
  • Long-horizon forecasting: Reported risk windows extend 10-20 years, with stronger signals for cancers, cardiovascular events, and conditions with clearer progression patterns.
  • Population-scale training: Combined Biobank and national registry data supports general patterns while preserving individual-level timelines.

Why this matters for science and clinical research

  • Earlier identification: Flags high-risk individuals well before symptoms, enabling screening, lifestyle guidance, and closer monitoring when it counts.
  • Trial design: Enrich cohorts with pre-symptomatic participants for prevention and progression studies; power long-term endpoints with better event timing.
  • Resource planning: Health systems can stratify populations to prioritize outreach, diagnostics, and follow-up pathways.
  • Etiology clues: Event-to-event dependencies offer hypotheses about disease sequences and comorbidity routes.

Method and validation notes researchers will care about

  • External validity: The team plans further testing across countries and demographic groups. Expect dataset shift, coding differences, and care-pathway effects to matter.
  • Calibration first: The quoted reliability is encouraging. Still verify Brier scores, calibration plots, and decision curves across age, sex, ancestry, and deprivation strata.
  • Phenotypes and leakage: ICD mapping, incident vs. prevalent case handling, and lookback windows can swing performance. Document label timing and censoring.
  • Missingness and bias: EHR gaps, survivorship bias, and healthcare access patterns require sensitivity analyses and subgroup reporting.
  • Interpretability for use: Event-level attributions, risk drivers by horizon, and counterfactuals can help clinicians assess actions and trust.
  • Governance: Privacy, audit trails, and post-deployment monitoring are table stakes for clinical rollout.

Environmental cost, practical upside

Training and serving large models draw significant energy and stress local grids. That said, accurate long-term risk forecasts can reduce unnecessary tests, focus clinical attention, and potentially cut avoidable admissions. Similar modeling approaches are being applied to hazards like severe weather and grid planning for clean energy-areas where better prediction can translate into fewer wasted resources.

What's next

  • Prospective and international validation: Head-to-head comparisons with established risk scores and clinician judgment, plus transportability checks.
  • Clinical integration: EHR connectors, alert thresholds, and handoff protocols. Measure impact on workflow, patient outcomes, and costs.
  • Regulatory and reporting: Transparent documentation (e.g., TRIPOD-AI style), preregistered evaluation plans, and post-market surveillance once deployed.

Where to learn more

For teams upskilling in AI for health data

If you're building or validating clinical prediction models, a focused skills refresh helps. See our AI certification for data analysis for practical, model-to-deployment workflows.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)