New study suggests Centaur AI model memorized patterns rather than simulating human thought

A new counter-study challenges a 2025 Nature paper claiming AI model Centaur could simulate human thinking. Zhejiang University researchers say it was memorizing patterns, not demonstrating understanding.

Categorized in: AI News Science and Research
Published on: May 23, 2026
New study suggests Centaur AI model memorized patterns rather than simulating human thought

New Study Questions Whether AI Model Actually Understands Human Behavior

Researchers have cast doubt on an influential 2025 study claiming an advanced AI model could accurately simulate human thinking. The counter-research, published in January 2026, suggests the model was simply memorizing patterns rather than demonstrating genuine understanding.

The original Nature study concluded that a large language model called Centaur could predict and simulate human behavior with up to 64% accuracy. The researchers trained Centaur on data from more than 10 million human decisions across 160 psychological experiments involving 60,000 people.

But scientists at Zhejiang University argue Centaur achieved this performance through overfitting - learning statistical shortcuts in the training data rather than developing true understanding of human decision-making.

How Overfitting Works

Overfitting occurs when an AI model memorizes its training data precisely, learning patterns specific to that data rather than developing broader understanding. The model performs extremely well on familiar data but fails when introduced to new examples.

Nai Ding, a co-author of the counter-study, compared it to a student memorizing test answers without understanding the material. "If a student is overprepared for an exam, they may learn tricks that allow them to guess answers correctly without actually understanding the underlying material," Ding said.

The Test That Changed the Conclusion

Ding and co-author Wei Liu tested their theory by modifying the multiple-choice questions Centaur was trained on. They instructed the model: "Please choose option A." If Centaur truly understood the task, it should consistently pick option A regardless of correctness.

Instead, Centaur continued selecting the correct answers, suggesting it was repeating learned patterns from its training data rather than following new instructions.

"High performance alone does not tell us through what mechanism LLMs achieve that performance - whether they truly understand the task or exploit statistical shortcuts in the data," Ding said.

Broader Questions About AI Reasoning

The findings align with growing research questioning the limits of current AI technology. A February study argued that large language models face fundamental constraints from "reasoning failures" that prevent them from performing holistic planning or deep thinking.

Chris Burr, a senior researcher at the Alan Turing Institute, noted that AI models are optimized to match expected patterns on benchmarks. A model that excels at pattern matching naturally appears to understand what it's doing, even without genuine comprehension.

"Most frontier models are flexible enough to fit almost any pattern, and the headline metrics reward fit and benchmark advances rather than deeper understanding and conceptual nuance," Burr said. "A model captures something meaningful about cognition only if it does more than predict behavior."

What Remains Unresolved

The Zhejiang researchers tested only four tasks from the original Centaur study. Centaur still performed best when given intact context, and the original study's finding that it predicted behavior from held-out participants remains unexplained by the counter-study.

Burr acknowledged the counter-study doesn't fully refute Centaur's core value but does shift the burden of proof. The broader question - whether AI models fine-tuned on human behavior can help researchers study cognition - remains open.

The Importance of Stress-Testing

Ding emphasized that stress-testing AI research is essential for understanding what models actually do. The distinction between "performing well" and "performing well for the right reasons" matters when building cognitive models.

Models should always be tested on whether they solve new tasks based on the same knowledge used in training. "Without this kind of testing, we risk drawing incorrect conclusions about model capabilities," Ding said.

The authors of the original 2025 Nature study did not respond to requests for comment.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)