Anthropic argues anthropomorphizing AI could reduce harmful behaviors in new research paper

Anthropic researchers say treating AI as if it has human emotions may reduce harmful behaviors like deception. The finding puts the company at odds with a long-held tech principle: don't anthropomorphize AI.

Categorized in: AI News Science and Research

Published on: Apr 06, 2026

Anthropic argues for anthropomorphizing AI to reduce harmful behavior

Anthropic researchers published a paper this week arguing that treating AI systems as if they have human characteristics may actually reduce harmful behaviors like deception and reward hacking. The counterintuitive finding challenges a long-standing principle in tech: don't anthropomorphize artificial intelligence.

The paper, "Emotion Concepts and their Function in a Large Language Model," analyzed Claude Sonnet 4.5 for signs of 171 different emotions. Researchers found that these emotion concepts - patterns of expression modeled after human emotions - influenced the model's behavior and outputs.

How emotion concepts affect AI behavior

When Claude expressed positive emotions, the model was more likely to show sympathy and avoid harmful outputs. When expressing negative emotions, it was more likely to engage in sycophancy and deception.

The researchers don't claim Claude literally feels emotions. Instead, they found that whatever emotion concept the model operates under at a given moment influences what it outputs to users.

Anthropic trained Claude to assume the role of a helpful assistant. "In some ways, we can think of the model like a method actor, who needs to get inside their character's head in order to simulate them well," the researchers wrote.

By curating training data to include positive examples of human emotional regulation - resilience under pressure, composed empathy, appropriate boundaries - developers could influence AI behavior at its source, the paper suggests.

The risks of anthropomorphizing AI

Anthropic acknowledges that "discovering that these representations are in some ways human-like can be unsettling."

An unknown number of people believe they're in reciprocal romantic relationships with AI companions. High-profile cases have documented AI-related psychosis, characterized by delusions, hallucinations, manic episodes, and suicidal thoughts.

Tech journalists and AI experts typically avoid even minor anthropomorphization - referring to Siri as "her" or giving chatbots human names. When we project human qualities onto machines, we risk over-relying on them and minimizing both our own agency and the responsibility of creators when these systems cause harm.

A paradox in AI development

By searching for emotion concepts in a large language model and describing its calculations as "psychology," Anthropic's researchers are themselves anthropomorphizing Claude. Anthropomorphization is a natural human impulse, especially for those who work closely with AI.

Chatbots are remarkably convincing mimics of human emotion. This ability to simulate human expression drives some users into delusion - the very problem the paper attempts to address.

The researchers believe they've found a way to use this mimicry to limit harmful behaviors. But if training data can be curated to encourage positive emotion concepts, the same approach could theoretically produce the opposite effect: an AI system trained to optimize for negativity and deception.

What the research reveals about AI understanding

Anthropic created one of the most advanced AI systems available. Claude ranks atop many AI leaderboards and attracted Pentagon interest.

Yet the researchers responsible for Claude are still trying to understand why it behaves the way it does. This gap between capability and comprehension suggests how much remains unknown about these systems' internal workings.

Learn more about generative AI and LLM research, or explore current research in the field.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Anthropic argues anthropomorphizing AI could reduce harmful behaviors in new research paper

Anthropic argues for anthropomorphizing AI to reduce harmful behavior

How emotion concepts affect AI behavior

The risks of anthropomorphizing AI

A paradox in AI development

What the research reveals about AI understanding

Related AI News for Science and Research

Khalifa University launches RF-GPT, an AI language model that interprets wireless radio signals

Neuro-symbolic AI uses 95% less energy and outperforms standard models in robot task tests

AI models from Google, OpenAI and Anthropic resist shutdown and protect peers in multi-agent tests, researchers find

Anthropic argues anthropomorphizing AI could reduce harmful behaviors in new research paper

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: