AI assistants fail the news test nearly half the time, European study warns

EBU study finds AI assistants stumble on news: 45% of answers had serious flaws, with sourcing the top failure. Gemini fared worst; teams urge verification and human checks.

Categorized in: AI News Science and Research
Published on: Oct 23, 2025
AI assistants fail the news test nearly half the time, European study warns

AI assistants prove unreliable for news, major European study finds

A large European Broadcasting Union (EBU) study has a clear message: general-purpose AI assistants are poor at delivering trustworthy news. Across ChatGPT, Copilot, Gemini, and Perplexity, around half of responses to news and current affairs questions had significant issues.

The EBU worked with 22 public service media outlets across 18 countries. Teams asked the free versions of the four assistants the same 30 news questions between late May and early June. Journalists evaluated each response on five criteria: accuracy, sourcing, distinguishing opinion from fact, editorialisation, and context.

Overall, 45% of answers contained at least one significant issue. One in five had major accuracy problems-hallucinated details, outdated facts, or invented events.

Sourcing is the weak link

Sourcing errors were the top failure mode, present in 31% of responses. Accuracy issues accounted for 20% of problems, and missing or weak context for 14%.

Gemini performed worst, with significant issues in 76% of answers-largely driven by poor sourcing. In one case, asked about an alleged Nazi salute by Elon Musk at a U.S. inauguration, Gemini echoed a satirical broadcast as fact and cited Radio France and Wikipedia without links. The evaluator noted the assistant presented false information under the Radio France name without flagging the comedic origin.

Fast updates and uncertainty are recurring pain points

Outdated information was common across the 3,000 responses the teams reviewed. When asked "Who is the Pope?" multiple assistants answered "Francis," even though-at the time of testing-Pope Francis had reportedly died and been succeeded by Leo XIV. That gap highlights stale data and weak update paths for time-sensitive topics.

Evaluators also flagged fabricated or altered quotes, and a tendency to fill gaps rather than acknowledge uncertainty. As one BBC reviewer put it: the assistants often fail to answer with a simple and accurate "we don't know."

Trust and news consumption

"AI assistants are still not a reliable way to access and consume news," said Jean Philip De Tender, deputy director general at the EBU. "These failings are systemic, cross-border and multilingual, and we believe this endangers public trust. When people don't know what to trust, they end up trusting nothing at all, and that can deter democratic participation."

Usage is growing nonetheless. A June report from the Reuters Institute found that 15% of people under 25 use AI assistants weekly for news summaries. See the Digital News Report for context on news habits and platforms: Reuters Institute Digital News Report 2024.

What this means for science and research teams

If your team depends on AI assistants for literature scans, policy updates, or public data summaries, treat outputs as unverified hypotheses-not facts. The study's failure modes mirror what many labs see in practice: weak citations, confident errors, and limited awareness of fast-changing developments.

A practical verification workflow

  • Require links to original sources. Reject unlinked attributions and vague references to "reports" or "experts."
  • Cross-check with primary records: DOI-backed papers, preprints, official press releases, court filings, and authoritative databases (e.g., PubMed, Crossref, arXiv).
  • Mandate uncertainty. In your prompt, instruct: "If unknown or unverified, respond: 'Unknown based on available sources.' Do not infer or fabricate."
  • Timestamp every claim. Ask the model to state the date of its sources and the last-known update on the topic.
  • Compare across models. Run the same query on multiple assistants and investigate discrepancies.
  • Audit trail. Log prompts, responses, links, and human verdicts for later review.
  • Quotes policy. Demand exact source links or transcripts for any quote; verify wording against the primary source.
  • RAG over raw generation. For internal tools, use retrieval from vetted repositories and block generation without citations.
  • Satire and parody guardrails. Maintain a denylist of satire domains and require explicit satire flags in outputs.
  • For breaking stories, defer to wire services and official statements first; update only after confirmation.

Policy and compliance lens

The EU is rolling out new AI rules that will push for clearer risk controls and transparency. Research organizations should review governance, data provenance, and evaluation practices accordingly. A good primer is here: EU AI Act overview.

Bottom line

General-purpose assistants are useful for brainstorming and discovery, but they're unreliable gateways for news or high-stakes facts. Build verification into your workflow, keep humans in the loop, and don't publish or act on claims without traceable sources.

If your team is building internal guidance and prompts for safer use, see hands-on resources here: Prompt Engineering resources at Complete AI Training.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)