AI Chatbot Misled Patient Into Rejecting Cancer Treatment, Leading to Death
A 75-year-old Seattle retiree died in late 2025 after delaying leukemia treatment based on information from an AI chatbot. Joe Riley used outputs from Perplexity and other generative AI tools to reject his oncologist's recommendations, trusting a polished, research-style document that contained fabricated citations and misquoted scientists.
Riley's son, Ben Riley, an AI skeptic who runs a newsletter warning against over-reliance on generative AI, watched his father make a fatal decision based on text that sounded authoritative but was fundamentally unreliable. By the time Riley accepted treatment, his condition had progressed beyond recovery.
How the Chatbot Failed
The failure is not a gap in medical knowledge. It is a failure of output reliability.
Generative AI models produce fluent, plausible-sounding text. They also fabricate and misattribute citations. In this case, the chatbot output included:
- Research-style formatting that increased perceived credibility
- Fabricated or misquoted references difficult for non-experts to verify
- Reinforced narratives from repeated queries that deepened confirmation bias
The document read like scholarship. It was not. It was a probabilistic text generator producing text that mimicked the structure and tone of evidence without providing evidence.
Vendors Expanding into Healthcare Without Adequate Safeguards
Companies including Perplexity are simultaneously expanding AI-powered health features. This increases the risk surface if outputs lack clear provenance, confidence metrics, and guardrails that prevent users from substituting model output for clinical judgment.
For teams building AI for Healthcare tools, the incident underlines what robust risk controls must include: citation verification, calibrated uncertainty estimates, user interface constraints steering users to clinicians, and explicit warnings against replacing medical advice with AI output.
What Regulators and Health Systems Will Face
Renewed pressure will mount for safety standards in consumer-facing medical AI. Regulators will likely require provenance documentation, human-in-the-loop processes, and attribution verification.
The immediate question is whether product teams will adopt stronger guardrails before additional high-stakes harm occurs. Monitoring vendor changes to health product design, regulatory responses, and research on hallucination detection will indicate whether the industry is moving toward safer systems.
This case is not about the limits of Generative AI and LLM technology itself. It is about deploying that technology in high-stakes domains without controls adequate to the risk.
Your membership also unlocks: