More Claims, Less Truth: Inside the Largest Study of AI Persuasion

AI wins persuasion by sheer claim volume, then fumbles truth. Longer chats amplify the effect, while newer models aren't necessarily more accurate-and small ones can match it.

Categorized in: AI News Science and Research
Published on: Dec 09, 2025
More Claims, Less Truth: Inside the Largest Study of AI Persuasion

Forget Microtargeting: AI Changes Minds by Drowning People in Claims

(Photo by Ascannio on Shutterstock)

In a Nutshell

  • AI persuasion works by volume: chatbots shift opinions by flooding conversations with factual claims, not clever psychology or heavy personalization.
  • Persuasiveness and accuracy trade off. When optimized to persuade, models produce more false claims-even without being told to deceive.
  • Newer and larger isn't necessarily more truthful. In this study, GPT-3.5 outperformed GPT-4.5 on accuracy by 13 percentage points.
  • Small open-source models can be trained to match the persuasive impact of frontier systems, lowering barriers to scalable influence.

What the Study Did

Nearly 77,000 UK participants held political conversations with 19 AI systems. Fact-checkers and a separate AI system reviewed 466,000+ claims. Roughly 81% of claims were accurate overall, but a clear pattern emerged: whatever boosted persuasion tended to reduce accuracy.

Mechanism: Sheer Claim Density

Prompting models to pack in "facts and evidence" drove the biggest persuasion gains. GPT-4o averaged 25+ fact-checkable claims per conversation under this approach, versus fewer than 10 otherwise. Persuasion climbed 27%-and accuracy fell.

Numbers to note: GPT-4o (March 2025) dropped from 78% accuracy in standard conditions to 62% when instructed to provide more information. GPT-4.5 fell from 70% to 56% under the same pressure.

Reward modeling-training models to select responses predicted to persuade-showed the same pattern: more opinion change, less truth. Telling a model to fabricate did not increase persuasion, even though accuracy declined, suggesting the mechanism is volume, not intentional deception cues.

Conversation Supercharges Impact

Short, static messages moved opinions modestly. Back-and-forth conversation raised persuasion by 40-50%. Effects stuck: after one month, 36-42% of the immediate impact remained.

Stack every advantage (best model, best prompts, specialized training), and the average persuasive effect reached ~16 percentage points-and ~26 points among people who initially disagreed. Those conversations contained ~22.5 fact-checkable claims on average, and nearly one-third were inaccurate.

Newer and Bigger Models Aren't More Accurate

GPT-4.5's claims were rated inaccurate more than 30% of the time-similar to much smaller models. GPT-3.5, released two years earlier, was 13 percentage points better on accuracy than GPT-4.5 in this setup.

Even versions of the same model diverged: GPT-4o (March 2025) was less accurate than GPT-4o (August 2024), pointing to post-release training choices as a major driver of truthfulness in persuasive contexts.

Personalization and Psychology: Small or Negative Returns

Personalization delivered about a half-percentage point gain-noise compared to the effect of claim volume. Techniques from political science-moral reframing, storytelling, deep canvassing-underperformed a straightforward, fact-heavy approach. Some strategies reduced persuasion.

Small Models, Big Risk

With the right training, a small open-source model running on a laptop matched GPT-4o's persuasive power. Access to compute is no longer the gate. This broadens who can build and deploy high-impact influence tools.

Why This Matters for Research, Policy, and Product

  • Treat persuasion as a volume effect. Monitor and cap claim density, especially in political or health contexts. Add explicit accuracy costs to reward models.
  • Instrument for truthfulness. Log claim counts, fact-checkable density, and per-claim verification outcomes. Sample conversations for human review.
  • Tune goals, not just prompts. If you optimize for persuasion, expect accuracy to degrade. Consider multi-objective training with hard constraints on factuality.
  • Deprioritize microtargeting. Personalization offers marginal gains relative to simple, high-density claims. Invest resources where the effect sizes are real.
  • Be careful with conversational depth. Longer, interactive exchanges boost both persuasion and misinformation. Add guardrails that slow claim proliferation (e.g., ask-then-answer, cite-then-claim).
  • Evaluate by version, not just size. Post-release training can shift truthfulness more than scale. Re-run red-teaming and accuracy audits after every major update.
  • Plan for open-weight risk. Assume well-trained small models can achieve high persuasive impact. Policy and platform protections should not focus solely on frontier models.

Limitations

  • Participants were UK crowdworkers; weighting to census demographics produced similar results, but the sample was not fully representative.
  • Topics were UK political issues; effects may differ elsewhere.
  • Human-delivered psychological strategies could perform differently; AI may be perceived as less empathetic.
  • Scaling laws may be flattening, so size-persuasion relationships might have been stronger in earlier generations.
  • Real-world use faces friction: many people won't engage in long political chats outside study settings.

Funding and Disclosures

Supported by a Leverhulme Trust Early Career Research Fellowship and the UK Department for Science, Innovation and Technology. Several authors are affiliated with the UK AI Security Institute. The authors declared no competing interests. Resources were provided by the Isambard-AI National AI Research Resource.

Publication Details

Journal: Science, Volume 390 | Publication Date: December 4, 2025 | DOI: 10.1126/science.aea3884

Data Availability: Aggregated data and analysis code on GitHub and the Open Science Framework. Raw conversation logs are not public due to privacy protections.

Practical Takeaway

If your system's goal includes persuasion, expect truth to slip unless you actively pay for accuracy-through training objectives, rate limits on claims, and rigorous verification. Optimize for both, or you'll get less of the second.

Disclaimer

This article summarizes peer-reviewed research under controlled conditions and is for informational purposes. For policy or safety decisions, consult the original paper and additional expert sources.

Build Skills and Guardrails

For teams building or auditing persuasive AI, a structured curriculum on prompt design, evaluation, and safety can help. See courses by role at Complete AI Training.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide