ChatGPT-Style AI Forecasts 1,000 Diseases Years in Advance

Delphi-2M forecasts risk for 1,000+ diseases years ahead from longitudinal health records. Promising, but biases and pending validation mean it's not ready for clinical use.

Categorized in: AI News Healthcare
Published on: Sep 22, 2025
ChatGPT-Style AI Forecasts 1,000 Diseases Years in Advance

Delphi-2M: An AI model that forecasts disease risk years ahead

Researchers across the UK, Denmark, Germany, and Switzerland have built an AI model, Delphi-2M, that estimates the future risk of over 1,000 diseases using longitudinal health records. It extends the same transformer architecture behind consumer chatbots like ChatGPT, but applies it to medical event sequences.

Trained on the UK Biobank and evaluated on nearly two million records from Denmark's public health database, the system shows promise in predicting who is at substantially higher or lower risk than traditional factors suggest. The team emphasizes it is not ready for clinical use.

How it works

Delphi-2M reads a patient's case history as a sequence, learning how diagnoses and events tend to appear together and in which order. As Moritz Gerstung from the German Cancer Research Center explains, the model picks up patterns in healthcare data similar to how language models learn grammar.

This approach enables risk forecasts years in advance, across a broad set of conditions. Early charts shared by the team show improved stratification for events like heart attacks beyond age-based risk alone.

What this could mean for care teams

  • Targeted screening and monitoring: Prioritize follow-up for patients flagged as high risk before symptoms escalate.
  • Proactive care plans: Inform earlier lifestyle counseling, medication reviews, or specialist referrals based on projected risk windows.
  • Capacity planning: Anticipate demand for cardiology, oncology, or metabolic clinics with population-level risk signals.
  • Registry enrichment: Identify cohorts for preventive trials or disease management programs more efficiently.

Important cautions

The research team notes biases in the British and Danish datasets, including age and ethnicity skews, which can limit generalizability. Health technology expert Peter Bannister also points to these biases as a reason for careful validation before any deployment.

Model predictions must be clinically calibrated, audited for fairness, and stress-tested across diverse EHR systems and coding standards. Until that work is done, these outputs should not guide clinical decisions.

Practical next steps for healthcare organizations

  • Define use cases: Start with clearly actionable endpoints (e.g., 1-5 year risk of MI, stroke, heart failure exacerbation).
  • Run a silent trial: Integrate predictions into the EHR back end, monitor performance without showing clinicians, and compare against existing risk tools.
  • Evaluate with the right metrics: AUROC/PR, calibration (calibration plots, Brier), decision curves, and net reclassification improvement.
  • Set action thresholds: Co-develop with clinical leaders to tie risk tiers to specific follow-ups, testing, or referrals.
  • Governance and safety: IRB review, bias audits by subgroup, monitoring for drift, and clear documentation of model limits.
  • Patient privacy: Strict de-identification and access controls; review local laws for secondary use of health data.

Technical notes for data and informatics teams

  • Data representation: Temporal sequences of diagnoses, procedures, medications, and labs; consistent coding (ICD-10, SNOMED, ATC/ RxNorm) and event timestamps.
  • Generalizability: Validate on external sites with different demographics, coding practices, and care pathways.
  • Calibration and interpretability: Post-hoc calibration (e.g., isotonic) and clinician-facing explanations (feature attributions, case-based examples) with guardrails.
  • Monitoring: Ongoing checks for performance drift, alert fatigue, and unintended workload shifts.

What to watch

Delphi-2M is a signal of where predictive healthcare is heading: longitudinal, multi-condition risk modeling that surfaces earlier windows for intervention. The decisive work now is clinical validation, equitable performance across populations, and tight integration with workflows so predictions translate into better outcomes.

Resources

Bottom line: The promise is clear, but so is the need for rigorous, inclusive validation and careful rollout. Treat Delphi-2M and similar models as decision support candidates-not decision makers-until they meet clinical, ethical, and regulatory standards.