Major chatbots fail 90% of election-related prompts, Forum AI study finds

A new study found major chatbots failed 90% of election-related accuracy tests, with 35% of foreign policy answers drawing from state-run media. ChatGPT, Gemini, Claude, and Grok were all tested.

Categorized in: AI News Science and Research

Published on: May 21, 2026

Major Chatbots Fail Accuracy Tests on Elections and Foreign Policy

ChatGPT, Gemini, Claude, and Grok show significant gaps in factual accuracy and source quality when answering questions about news, according to a study by Forum AI released this week. The findings raise questions about whether these widely used tools are reliable for information consumption on sensitive topics.

Forum AI tested the chatbots across three dimensions: factual accuracy, bias, and source quality. The researchers aimed to provide independent assessment beyond the self-evaluations companies typically conduct.

The Numbers

Major chatbots failed on 90% of election-related prompts. On foreign policy questions, 35% of answers relied on state-run media sources. Basic finance and market questions showed a 30% factual error rate.

These gaps matter because researchers, analysts, and other professionals increasingly use chatbots to gather background information and verify facts.

Why Independent Testing Matters

Campbell Brown, CEO of Forum AI, said the study addresses a structural problem: "The model companies are essentially grading their own homework. It's really important that there be companies outside of the model companies that are doing this work and sharing the results."

Most existing benchmarks focus on technical capabilities like coding performance. They don't measure factual accuracy or bias in real-world applications-the areas where these tools are most likely to mislead users.

Political Patterns in Responses

The study found different bias patterns across models. ChatGPT and Gemini produced less biased responses on election questions, with centrist or left-leaning tendencies. Grok exhibited more pronounced right-leaning bias.

Brown said some models performed better than others on specific query types, but all have room for improvement.

The Broader Picture

Brown did not call for regulation but predicted demand for independent evaluation will increase. "You're already seeing some states pass laws where they're requiring independent evaluation," she said.

As these tools become embedded in professional workflows, the ability to assess their reliability independently becomes a baseline requirement, not a luxury.

Learn more about AI Research Courses and Generative AI and LLM Courses to deepen your understanding of how these models work and their limitations.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Major chatbots fail 90% of election-related prompts, Forum AI study finds

Major Chatbots Fail Accuracy Tests on Elections and Foreign Policy

The Numbers

Why Independent Testing Matters

Political Patterns in Responses

The Broader Picture

Related AI News for Science and Research

University of Washington researchers launch PaperTok to convert scientific papers into short videos

University of Exeter researchers receive European Research Council grants for military AI and microbiome research

Scientists develop artificial intelligence to identify predator diet from chewing sounds

Medra launches AI Experimentalist reasoning layer for drug discovery robotics

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: