AI fake-news detectors look strong in the lab, but stumble in real use
AI tools promise to separate fact from fiction. New research from Université de Montréal's Dorsaf Sallami shows why those promises often break once you leave the benchmark and enter the feed.
Her core point is blunt: these systems don't verify facts like a journalist. They estimate probabilities from what they've seen-mirroring their training data, biases included.
A mirror, not a fact-checker
When an AI flags a post as false, it isn't checking primary sources. It's matching patterns and scoring likelihoods based on prior examples.
That makes performance highly sensitive to dataset composition, labeling choices, and recency. If the training data is skewed or stale, the "truth" it reflects will be too.
Who decides what's true?
Training requires thousands of labeled examples. For cats vs. dogs, that's fine. For misinformation, even experts disagree-what Sallami calls the ground-truth problem.
Fact-check labels come from organizations with varying methods and transparency, sometimes for-profit. If labels are contested, the model's "truth" is contested.
Benchmarks vs. the real world
Big platforms are shipping detection features: Meta's labels, Google's Gemini-based prototypes, X's Grok scanning in real time. Impressive on paper.
But 95% accuracy in controlled tests can collapse in production-especially under privacy constraints, distribution shift, and adversarial adaptation. Worse, such systems can be co-opted to mute opposition or favor certain outlets.
Bias that hides in plain sight
Sallami's analysis shows bias isn't theoretical. Models were more likely to tag women as disinformation spreaders when gendered language appeared, and were tougher on non-Western and specific political/geographic sources.
Her stance: equity is part of performance, not an optional add-on. If a detector is fast and "accurate" yet discriminatory, it fails.
Adversaries move faster than your training set
Large language models make it easy to mimic credible prose and citation styles. Tactics evolve monthly, sometimes weekly.
Detectors trained on last quarter's tricks often miss today's. This is classic distribution shift-exactly where many benchmarked systems are weakest. For context on model behavior, see Generative AI and LLM.
What the research proposes
Sallami argues for a socially responsible evaluation framework that weighs accuracy alongside equity, transparency, privacy, and real-world usefulness. She also calls for heavier use of user feedback, plus collaboration with journalists, social scientists, and legal experts.
Technically, her work introduces methods to measure and reduce bias and presents CoALFake, a framework that helps detectors adapt across domains (e.g., politics to science or commercial claims) without retraining from zero.
Aletheia: explain, don't just label
Many detectors target technical users. Aletheia flips that by giving end users a browser extension that explains its reasoning, surfaces evidence, and lets people discuss claims.
The VerifyIt module consults external sources and issues a verdict with plain-language justification and linked references. In tests against claims checked by PolitiFact, it reached ~85% reliability-competitive with and often better than existing tools.
For science and research teams: a practical checklist
- Labels and disagreement: capture inter-annotator agreement; model label uncertainty; log label provenance and criteria.
- Evaluate beyond accuracy: add fairness metrics (by source region, outlet type, gendered language), calibration, and abstention rates.
- Test for shift: run cross-domain, temporal, and adversarial evaluations; simulate concept drift; measure degradation and recovery.
- Bias audits: probe for political, geographic, and demographic bias; use counterfactual and stress tests; report gaps in model cards.
- Privacy by design: minimize data collection, set strict retention, and audit feature pipelines for sensitive inferences.
- Human-in-the-loop: design clear explanations; include uncertainty; allow appeals and corrections; integrate journalist workflows.
- Governance and misuse: define escalation paths, red-team regularly, and document safeguards against censorship or viewpoint discrimination.
- Adaptation: use domain adaptation (e.g., CoALFake-style methods) and continuous learning with guardrails; monitor for overfitting to known tactics.
Citations and further reading
Esma Aïmeur, Dorsaf Sallami, Gilles Brassard, "Too Focused on Accuracy to Notice the Fallout: Towards Socially Responsible Fake News Detection," AIES (2025). DOI: 10.1609/aies.v8i1.36530
Dorsaf Sallami et al., "Aletheia: Detect, Discuss, and Stay Informed on Fake News," IJCAI (2025). DOI: 10.24963/ijcai.2025/1273
Your membership also unlocks: