OIG Flags Patient Safety Risks in VHA's Use of Clinical AI Chatbots
The Department of Veterans Affairs' watchdog warned that generative AI chatbots used across the Veterans Health Administration pose a potential patient safety risk. The core issue: VHA lacks a formal, standardized process to identify, track, and resolve risks tied to these tools.
Investigators reviewed two chat tools in use: VA GPT and Microsoft 365 Copilot chat. Both rely on clinical prompts, have no web search access, and operate on knowledge that isn't current-conditions that increase the chance of omissions or false output in clinical workflows.
What the watchdog found
- VHA's clinical uses of generative AI are being shaped by an informal collaboration between the National Artificial Intelligence Institute and the department's chief AI officer, with limited coordination with the National Center for Patient Safety.
- No formal mechanism exists to flag, log, and resolve AI-related risks in clinical settings. Without that system, there's no reliable feedback loop to catch patterns and improve safety.
- Generative chat can omit relevant information or produce confident but incorrect responses-problems that can mislead documentation, triage, or patient communications if not checked.
Scale and stakes
VA's 2024 public inventory listed 227 AI use cases. Of those, 145 were marked safety- or rights-impacting-technologies that could affect patient wellbeing or legally protected liberties. Some AI is used for high-value tasks such as identifying veterans at high risk of suicide, where VA officials have emphasized that AI should augment outreach and training-not replace judgment.
Why this matters for leaders
Clinical chatbots can help with speed and consistency, but without guardrails they add safety, privacy, and liability exposure. A standardized risk process is not optional if AI is touching clinical notes, orders, patient messaging, or decision support.
Immediate steps for VHA facilities and program offices
- Stand up an AI clinical safety review group that includes patient safety, clinical leadership, informatics, compliance, privacy, security, and ethics.
- Require pre-deployment risk assessments and a clear intended-use statement for every generative AI tool interacting with clinical workflows.
- Implement AI-specific logging and incident reporting (misleading outputs, unsafe suggestions, privacy concerns). Route events to the National Center for Patient Safety with clear escalation paths.
- Constrain scope and data access (no unneeded integrations, disable web plugins, strict PHI handling). Version models, system prompts, and configurations so you can audit changes.
- Publish clinician use policies: AI drafts are advisory; clinicians remain accountable; require verification steps; prohibit autonomous diagnosis, medication changes, or triage decisions.
- Train staff on prompt discipline, bias, safe use cases, and how to report issues quickly.
Practical guardrails for clinical chatbots
- Use curated, approved references and time-stamp outputs; include citations where feasible.
- Limit to low-risk tasks: summarizing clinician-authored notes, drafting patient education for clinician review, coding support. Avoid high-risk clinical decision-making.
- Run pilots with A/B comparisons. Track error types (omissions, fabrications), rates, and downstream corrections. Compare against baselines.
- Keep a human in the loop inside the EHR: no auto-commit; require sign-off and attestation.
What to watch next
Expect tighter oversight and clearer expectations for AI risk management across VHA. Aligning local governance with recognized frameworks will speed safe adoption and reduce rework later.
Resources
- NIST AI Risk Management Framework
- VA National Center for Patient Safety
- Complete AI Training: Courses by Job (for clinical, ops, and IT teams)
Your membership also unlocks: