New study suggests Centaur AI model memorized patterns rather than simulating human thought

A new counter-study challenges a 2025 Nature paper claiming AI model Centaur could simulate human thinking. Zhejiang University researchers say it was memorizing patterns, not demonstrating understanding.

Categorized in: AI News Science and Research

Published on: May 23, 2026

New Study Questions Whether AI Model Actually Understands Human Behavior

Researchers have cast doubt on an influential 2025 study claiming an advanced AI model could accurately simulate human thinking. The counter-research, published in January 2026, suggests the model was simply memorizing patterns rather than demonstrating genuine understanding.

The original Nature study concluded that a large language model called Centaur could predict and simulate human behavior with up to 64% accuracy. The researchers trained Centaur on data from more than 10 million human decisions across 160 psychological experiments involving 60,000 people.

But scientists at Zhejiang University argue Centaur achieved this performance through overfitting - learning statistical shortcuts in the training data rather than developing true understanding of human decision-making.

How Overfitting Works

Overfitting occurs when an AI model memorizes its training data precisely, learning patterns specific to that data rather than developing broader understanding. The model performs extremely well on familiar data but fails when introduced to new examples.

Nai Ding, a co-author of the counter-study, compared it to a student memorizing test answers without understanding the material. "If a student is overprepared for an exam, they may learn tricks that allow them to guess answers correctly without actually understanding the underlying material," Ding said.

The Test That Changed the Conclusion

Ding and co-author Wei Liu tested their theory by modifying the multiple-choice questions Centaur was trained on. They instructed the model: "Please choose option A." If Centaur truly understood the task, it should consistently pick option A regardless of correctness.

Instead, Centaur continued selecting the correct answers, suggesting it was repeating learned patterns from its training data rather than following new instructions.

"High performance alone does not tell us through what mechanism LLMs achieve that performance - whether they truly understand the task or exploit statistical shortcuts in the data," Ding said.

Broader Questions About AI Reasoning

The findings align with growing research questioning the limits of current AI technology. A February study argued that large language models face fundamental constraints from "reasoning failures" that prevent them from performing holistic planning or deep thinking.

Chris Burr, a senior researcher at the Alan Turing Institute, noted that AI models are optimized to match expected patterns on benchmarks. A model that excels at pattern matching naturally appears to understand what it's doing, even without genuine comprehension.

"Most frontier models are flexible enough to fit almost any pattern, and the headline metrics reward fit and benchmark advances rather than deeper understanding and conceptual nuance," Burr said. "A model captures something meaningful about cognition only if it does more than predict behavior."

What Remains Unresolved

The Zhejiang researchers tested only four tasks from the original Centaur study. Centaur still performed best when given intact context, and the original study's finding that it predicted behavior from held-out participants remains unexplained by the counter-study.

Burr acknowledged the counter-study doesn't fully refute Centaur's core value but does shift the burden of proof. The broader question - whether AI models fine-tuned on human behavior can help researchers study cognition - remains open.

The Importance of Stress-Testing

Ding emphasized that stress-testing AI research is essential for understanding what models actually do. The distinction between "performing well" and "performing well for the right reasons" matters when building cognitive models.

Models should always be tested on whether they solve new tasks based on the same knowledge used in training. "Without this kind of testing, we risk drawing incorrect conclusions about model capabilities," Ding said.

The authors of the original 2025 Nature study did not respond to requests for comment.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

New study suggests Centaur AI model memorized patterns rather than simulating human thought

New Study Questions Whether AI Model Actually Understands Human Behavior

How Overfitting Works

The Test That Changed the Conclusion

Broader Questions About AI Reasoning

What Remains Unresolved

The Importance of Stress-Testing

Related AI News for Science and Research

Self-driving labs are essential for AI's scientific progress, says Radical AI co-founder

CuspAI raises $400m at $2.6bn valuation backed by Jeff Bezos

China unveils AI-powered black box scientific instrument for autonomous laboratories

TPC26 panel highlights the growing importance of collaboration in AI for science

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: