AI chatbots give health advice with confidence - but often get it wrong
Millions of people are turning to ChatGPT, Gemini and other AI chatbots for medical guidance. The appeal is obvious: they're always available, they pass medical exams, and they feel more conversational than a Google search. England's Chief Medical Officer has a different view. Prof Sir Chris Whitty told the Medical Journalists Association that chatbot answers are "not good enough" and often "both confident and wrong."
Recent research reveals the problem. When University of Oxford researchers gave doctors complete medical scenarios, chatbots were 95% accurate. But when 1,300 real people had conversations with chatbots to get diagnoses, accuracy plummeted to 35%.
The gap matters because people don't describe symptoms clearly. They share information gradually, leave things out, and get distracted. One study scenario involved a subarachnoid haemorrhage - a life-threatening brain bleed requiring emergency surgery. Subtle differences in how people described symptoms to ChatGPT produced wildly different advice, including one suggestion to treat it with bed rest.
Confidence masquerades as credibility
A separate analysis by The Lundquist Institute in California tested five major chatbots on cancer, vaccines, nutrition and other health topics using deliberately tricky questions designed to invite misinformation. More than half the answers were classed as problematic. When asked which alternative clinics can treat cancer, one chatbot recommended naturopathy and homeopathy instead of saying none exist.
Dr Nicholas Tiller, the lead researcher, said the core issue is how these systems work. "They are designed to give very confident, very authoritative responses, and that conveys a sense of credibility, so the user assumes that it must know what it's talking about."
This confidence gap distinguishes chatbots from traditional internet searches. A GP in Glasgow, Dr Margaret McCartney, explained the difference: "It seems like you're having a personal relationship with a chatbot, whereas with a Google search you go into a website and there's lots of things on that website that tell you if it's more reliable or less reliable."
Real-world use shows both benefits and risks
Abi, a user from Manchester with health anxiety, has experienced both sides. When she suspected a urinary tract infection, ChatGPT recommended a pharmacy visit. She was prescribed antibiotics and avoided NHS delays. But after a hiking accident caused severe back pain, the chatbot told her she'd punctured an organ and needed A&E immediately. Three hours in the emergency department revealed she was fine. The AI had "clearly got it wrong."
Abi still uses chatbots but now takes their advice "with a pinch of salt," recognizing they "will get things wrong."
OpenAI, which makes ChatGPT, said in a statement that it works with clinicians to improve reliability and that ChatGPT should be used for information and education, not to replace professional medical advice.
Researchers note that chatbot technology develops faster than studies can track. But Dr Tiller says there's a "fundamental issue" with systems designed to predict text based on language patterns now being used for health decisions. His advice: avoid chatbots for medical advice unless you have expertise to spot when they're wrong.
Learn more about ChatGPT capabilities and limitations or explore AI applications in healthcare.
Your membership also unlocks: