Binghamton researchers use multi-model voting system to eliminate AI hallucinations in medical diagnosis

Binghamton University researchers eliminated AI hallucinations in medical diagnosis by running questions through seven chatbots and accepting only answers backed by multiple models. Across 10,000 tests, no false information slipped through.

Categorized in: AI News IT and Development
Published on: May 29, 2026
Binghamton researchers use multi-model voting system to eliminate AI hallucinations in medical diagnosis

Binghamton researchers cut AI hallucinations by having multiple chatbots "vote" on answers

Researchers at Binghamton University have developed a method to eliminate false information generated by large language models - a persistent problem when people use AI chatbots for medical diagnosis.

The protocol runs the same question through seven different open-source LLMs, then uses a voting system to verify answers. In tests across 10,000 experiments, 76.85% of medical diagnoses were supported by at least four models, with the remaining 23.15% backed by at least two. No hallucinations appeared.

Ahmed Abdeen Hamed and Luis M. Rocha, researchers in Binghamton's School of Systems Science and Industrial Engineering, developed the approach with funding from New York's Empire AI Consortium. The journal STAR Protocols published their findings.

How the verification works

Each of the seven LLMs received the same plain-language symptom descriptions. Before responding, they had to reference retrieval-augmented generation (RAG), which anchored them to authoritative medical databases. Each model then produced medical terminology with official identification numbers.

The models submitted their answers to a vote. Answers backed by multiple models made it through; unsupported claims did not.

"The new workflow can verify anything from a biomedical point of view - biological knowledge with disease and genetics, translational knowledge from diseases to treatments, and healthcare applications with symptoms and treatments," Hamed said.

Building confidence through repetition

The protocol can be run repeatedly with different randomly selected LLMs from a pool of 100+ open-source models. Each iteration increases confidence in the voting outcome.

"When we perform the experiment many, many times, we increase the confidence in the voting," Hamed said.

Rocha noted the method could support development of "digital twins" for precision medicine - virtual replicas of physical processes updated in real time to create predictive simulations of human reactions.

Beyond medical diagnosis

While the study focused on biomedical applications, the verification method could address hallucinations in other domains: fabricated legal citations, fake academic references, or historical errors.

"This protocol is a big step toward the democratization of knowledge verification," Hamed said.

Hamed has moved from Binghamton to a research associate professor role at the University of Nebraska-Lincoln. Rocha credited him with developing AI workflows and catalyzing new research directions within the lab.

For professionals working with generative AI and LLMs, the research demonstrates how structured prompt engineering combined with multiple model verification can reduce the risk of relying on confidently stated but false information.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)