Binghamton researchers use multi-model voting system to eliminate AI hallucinations in medical queries

Binghamton University researchers eliminated AI hallucinations in medical queries by having seven chatbots vote on answers after checking authoritative databases. Tests across 10,000+ experiments produced zero hallucinations.

Categorized in: AI News Science and Research
Published on: Jun 04, 2026
Binghamton researchers use multi-model voting system to eliminate AI hallucinations in medical queries

Binghamton Researchers Cut AI Hallucinations by Having Multiple Chatbots Vote on Answers

Researchers at Binghamton University developed a protocol that reduces false information from large language models by forcing seven different AI chatbots to verify medical information against authoritative databases before answering questions.

The method works like this: seven open-source large language models receive the same plain-language symptom description. Each one translates the symptoms into medical terminology with official identification numbers. Then the models vote on which answers are correct.

Results from over 10,000 experiments showed 76.85% of answers were supported by at least four models, with the remaining 23.15% supported by at least two. No unmatched terms appeared - and no hallucinations.

Ahmed Abdeen Hamed, a research fellow at Binghamton's School of Systems Science and Industrial Engineering, led the work with Luis M. Rocha, a professor of systems science. The journal STAR Protocols published their findings in May 2026.

Why This Matters for Medical AI

People increasingly ask ChatGPT and similar tools about health concerns. Last year, Binghamton researchers found that ChatGPT identified disease terms, drug names, and genetic information accurately - but also generated a high number of confident false statements.

The new protocol required each model to use retrieval-augmented generation (RAG), meaning they had to reference an authoritative medical database before responding. This constraint eliminated the hallucinations entirely.

Hamed said the approach can verify biomedical knowledge across three areas: disease and genetics, translational knowledge from diseases to treatments and clinical trials, and healthcare applications like symptoms and treatments.

Reproducible and Scalable

A key advantage is reproducibility. The team can run the experiment repeatedly with different random selections of seven models from a pool of 100 open-source options. Each repetition increases confidence in the voting results.

"When we perform the experiment many, many times, we increase the confidence in the voting," Hamed said.

The protocol also works beyond medicine. Hamed noted it could eliminate fabricated legal citations, fake academic references, or historical errors in other domains.

Broader Applications Ahead

Rocha's lab at Binghamton is developing "digital twins" for precision medicine - virtual replicas of physical processes continuously updated with AI and real-time data. The verification protocol could extract evidence for adverse drug reactions from clinical trials, scientific literature, pharmacological databases, and social media.

The research received a $100,000 grant from New York state's Empire AI Consortium.

Hamed is transitioning from his Binghamton fellowship to a research associate professor role at the University of Nebraska-Lincoln. Rocha said Hamed's work at Binghamton was "most productive" and catalyzed new ideas across the lab.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)