AI Assistants Stumble On News: 45% Of Answers Had Significant Issues
AI assistants are struggling with news questions. In a cross-market study of 2,709 responses, 45% contained significant issues and 81% had at least one issue. The biggest weakness was sourcing-missing citations, misattribution, and misleading references.
Performance varied by platform. Google Gemini showed the most severe sourcing problems, with 76% of responses containing significant issues and 72% with sourcing issues. Other assistants stayed at or below 37% for major issues overall and below 25% for sourcing issues.
What The Study Measured
The European Broadcasting Union (EBU) and BBC evaluated free/consumer versions of major AI assistants answering news questions. Tests covered ChatGPT, Copilot, Gemini, and Perplexity across 14 languages from 22 public-service media organizations in 18 countries.
Responses were generated between May 24 and June 10 using a shared set of 30 core questions, plus optional local questions. Many participating organizations temporarily lifted technical blocks so assistants could access their content during the test period.
Key Findings
- 45% significant issues; 81% had some issue. Problems were consistent across languages and markets.
- Sourcing was the top failure mode (31% at a significant level). Citations were missing, misattributed, or pointed to off-topic pages.
- Platform spread: Gemini had the highest rate of significant issues (76%) and sourcing problems (72%). Others were notably lower but still showed gaps.
- Accuracy slips: Some answers were outdated or incorrect. One highlighted example: mischaracterizing changes to laws on disposable vapes.
Why This Matters For General, IT, And Development Roles
People trust concise AI summaries, especially under time pressure. If those summaries cite the wrong source-or no source-bad information spreads fast and accountability gets murky.
For teams that build with AI, this is a quality and risk problem. For publishers and comms teams, it's a brand problem: your reporting can be misrepresented in answers that appear authoritative at first glance.
Practical Steps To Reduce Risk
- Set a "citations-first" rule. Treat any unsourced claim as unverified. Require links to primary or high-authority sources with dates.
- Use allowlists and source tiers. Prefer official documents, government pages, and recognized outlets. Down-rank blogs and aggregation sites without clear authorship.
- Force structure in prompts. Ask assistants to provide: 1) a short answer, 2) bulletproof citations (with direct URLs), 3) timestamps, and 4) a confidence note.
- Add retrieval and checks. Pair the model with search or a news API. Pull the latest sources, then have the model summarize and cross-verify before responding.
- Block "fresh news" autopublish. For events in the last 24-48 hours, require human review or display a visible freshness warning.
- Build an evaluation harness. Create a test suite of recurring news prompts in multiple languages. Track citation validity, link accuracy, and factual precision over time.
- Log and audit. Store prompts, outputs, and clicked citations. Spot patterns like repeated misattribution or dead links.
- For publishers: Monitor how assistants reference your content, ensure structured data is clean, and consider guidance pages for AI crawlers.
Methodology Notes That Affect Interpretation
- Consumer models only: The study looked at free/consumer versions to reflect typical use.
- Temporary access changes: Some outlets removed blocks so assistants could reach their content during the tests.
- Time-bounded run: Responses were captured in a specific window, which matters for fast-moving stories.
What's Next
The EBU and BBC released a News Integrity in AI Assistants Toolkit with guidance for tech companies, media, and researchers. Reuters also reports concerns that growing reliance on assistants for news could erode public trust (Reuters).
As EBU Media Director Jean Philip De Tender put it: "When people don't know what to trust, they end up trusting nothing at all, and that can deter democratic participation."
Bottom Line
Use AI assistants for speed, not as a single source of truth. Bake verification into your prompts, pipelines, and publishing rules. If the answer doesn't show its receipts, it's not ready for production-or your reputation.
Level Up Your Team
If your team needs repeatable prompts and review workflows that produce verifiable answers, explore focused training on prompts and evaluations:
Your membership also unlocks: