AI Model Trained on 57 Million NHS Records Sparks Privacy and Ethical Concerns
An AI model trained on 57 million NHS records could predict health trends and aid diagnosis. Experts warn, however, that privacy risks and data protection remain major concerns.

Concerns Raised Over AI Trained on 57 Million NHS Medical Records
An AI model called Foresight, trained on the medical data of 57 million NHS patients in England, aims to assist healthcare professionals by predicting disease and hospitalisation rates. While its creators highlight potential benefits, experts voice concerns about privacy risks and data protection.
About the Foresight Model
Initially developed in 2023 using OpenAI’s GPT-3 and data from 1.5 million patients in London hospitals, Foresight has since expanded. The latest version, based on Meta’s open-source LLM Llama 2, incorporates eight NHS datasets collected between November 2018 and December 2023. These include outpatient appointments, hospital visits, vaccination records, and more, covering about 10 billion health events for nearly everyone in England.
The team at University College London, led by Chris Tomlinson, describes Foresight as the world’s first “national-scale generative AI model of health data.” They believe it could eventually support diagnosis, forecast health trends like heart attack risks, and enable earlier interventions for better preventative care.
Privacy and Data Protection Concerns
Despite the potential, the project raises serious privacy questions. The data fed into Foresight was de-identified, removing direct personal identifiers. However, experts warn that the richness of such large datasets makes complete anonymisation difficult.
- Luc Rocher from the University of Oxford stresses that protecting patient privacy while building powerful generative AI models remains an unsolved problem.
- Michael Chapman from NHS Digital acknowledges the risk of re-identification, noting that no system can guarantee 100% anonymity with complex health data.
- Yves-Alexandre de Montjoye at Imperial College London highlights the importance of testing whether AI models memorize identifiable information, a test Foresight’s creators plan to conduct.
To reduce risks, Foresight operates within a secure NHS data environment accessible only to approved researchers. Infrastructure support comes from Amazon Web Services and Databricks, though these companies do not have data access.
Ethical and Legal Challenges
Many patients are unaware their data is used in this way, which can erode trust. Caroline Green from Oxford points out that people want control over their personal information and transparency about its use.
Opt-out options are limited. Since the data is de-identified and drawn from national NHS datasets, existing opt-out mechanisms do not apply. Individuals who refuse to share their family doctor data are excluded, but otherwise, their records remain part of the training data.
The General Data Protection Regulation (GDPR) requires that individuals can withdraw consent for personal data use. However, the nature of large language models like Foresight makes removing a single record impossible once trained.
An NHS England spokesperson stated that the anonymised data means GDPR does not apply in this case. Yet, the UK's Information Commissioner’s Office clarifies that “de-identified” should not be confused with fully anonymous data, creating legal ambiguity.
Currently, Foresight is used for covid-19 research under pandemic-related legal exceptions, which further complicates the data protection landscape. Data privacy advocates argue that patient data embedded in AI models must remain secure and controlled.
Looking Ahead
The debate around Foresight highlights a broader issue in healthcare AI development: ethics and patient rights need to be central from the start, not an afterthought. Protecting sensitive medical data while exploring AI’s potential remains a balancing act that requires ongoing scrutiny.
Healthcare professionals interested in the practical application and ethical use of AI in medicine may benefit from exploring targeted AI training resources. For example, Complete AI Training offers courses tailored for healthcare workers to better understand AI tools and data privacy considerations.