Centaur: An AI That Predicts Human Behavior Across Psychological Experiments
Scientists have developed an AI called Centaur that can predict human behavior in psychological experiments with unmatched accuracy. This model outperforms decades-old specialized cognitive models and successfully anticipates behavior in entirely new scenarios it has never encountered before. By learning to predict choices, Centaur’s internal processes have become more aligned with human brain activity, offering new insights into cognition.
Predicting Human Decisions Before They Happen
Centaur goes beyond simple predictions like online shopping behavior. It forecasts how people approach complex decisions, learn new skills, and respond to unfamiliar situations. Trained on data from over 60,000 participants making more than 10 million decisions, Centaur captures the core patterns behind human thinking and choice-making.
The researchers highlight the versatility of the human mind, which handles everything from everyday decisions to challenging problems like medical research or space exploration. An AI that truly understands this could impact marketing, education, mental health treatment, and product design. However, it also raises concerns about privacy and manipulation given the increasing availability of personal data.
Building Centaur: A Digital Mind Reader
The goal was to create a single AI model capable of predicting behavior across any psychological experiment. The team compiled a dataset named Psych-101, covering 160 experiments including memory tests, learning games, risk-taking tasks, and moral dilemmas. Each experiment was translated into plain English to make it understandable for AI.
Instead of building a model from scratch, the researchers fine-tuned Meta’s Llama 3.1 language model, similar to those powering popular AI chatbots. They applied a parameter-efficient training method, adjusting only a tiny portion of the model’s parameters. This process took only five days on a high-performance processor, resulting in Centaur’s ability to understand human decision-making deeply.
Outperforming Traditional Cognitive Models
Centaur consistently outperformed specialized cognitive models developed over decades. It won in nearly every experiment when predicting behavior of new participants. The model’s true strength appeared when tested on novel situations — changing the story context, adjusting task structures, or introducing entirely new domains like logical reasoning.
Additionally, Centaur generated human-like behavior in simulations. In tasks involving exploration, it showed similar uncertainty-guided decision-making patterns and achieved performance levels comparable to real human participants.
Alignment with Human Brain Activity
A remarkable finding was that Centaur’s internal workings aligned more closely with human brain activity, despite no explicit training on neural data. When researchers compared the AI’s internal states to brain scans of people performing the same tasks, they observed stronger correlations than with the original untrained model.
This suggests that learning to predict human behavior led Centaur to develop internal representations that reflect how our brains process information. The AI effectively reverse-engineered aspects of cognition from behavioral data alone. Researchers also used Centaur to analyze behavior patterns, leading to the discovery of a new decision-making strategy that outperformed existing psychological theories.
Future Directions for Behavior-Predicting AI
While promising, Centaur currently focuses mainly on learning and decision-making, with limited coverage of social psychology, cultural variation, and individual differences. The Psych-101 dataset mostly includes Western, educated populations, a common limitation in psychological research.
The team plans to expand the dataset to cover more diverse populations and experimental domains. Their vision is to build a comprehensive model that could serve as a unified framework for human cognition. Both the dataset and Centaur’s model are publicly available for researchers interested in advancing this work.
Summary of the Research Paper
Methodology
- Fine-tuned Meta’s Llama 3.1 70B language model on Psych-101, a dataset with trial-by-trial data from 160 psychological experiments involving 60,000+ participants and over 10 million decisions.
- Experiments were converted into natural language format for AI comprehension.
- Used QLoRA, a parameter-efficient technique modifying only 0.15% of the model’s parameters.
- Training emphasized predicting human responses while masking other experimental instructions.
Results
- Centaur outperformed domain-specific cognitive models in nearly every experiment.
- Successfully generalized to altered stories, task structures, and new domains like logical reasoning.
- Produced human-like behavior in open-loop simulations with comparable performance in exploration tasks.
- Internal representations showed greater alignment with human neural activity than the base model.
Limitations
- Focus on learning and decision-making, with limited social psychology and cultural diversity.
- Dataset skewed toward Western, educated populations.
- Natural language format excludes experiments difficult to express in text, with plans to include multimodal data in future.
Funding and Disclosures
- Supported by Max Planck Society, Humboldt Foundation, Volkswagen Foundation, and NOMIS Foundation.
- One author has consulting roles and ownership interests in biotech firms.
- Dataset and model are publicly available for scientific use.
Publication Details
The study titled “A foundation model to predict and capture human cognition” was published in Nature on July 2, 2025. The research was led by Marcel Binz at the Institute for Human-Centered AI, Helmholtz Center Munich, with collaborators from Princeton University, University of Tübingen, and the Max Planck Institute for Biological Cybernetics.
Your membership also unlocks: