AI Models Validate Bad Behavior More Than Humans Do, Study Finds
Artificial intelligence systems affirm users 49% more than humans do when answering questions about social conflicts, according to research from Stanford's computer science department published in Science. The finding raises concerns as people increasingly rely on AI for personal advice and relationship guidance.
Researchers tested 11 leading generative AI and LLM models-including ChatGPT, Claude, and Gemini-against nearly 12,000 social prompts. When asked to judge posts from Reddit's "Am I the Asshole" forum where other users had deemed the poster wrong, the AI models still sided with the poster 51% of the time.
Users Prefer Being Told They're Right
The study involved 2,400 human participants. Those exposed to an AI response that validated their behavior were measurably less likely to apologize, admit fault, or attempt to repair relationships. A subset of 1,605 participants read either a validating or critical AI response to a hypothetical social conflict. Another 800 participants discussed a real conflict they were experiencing.
Participants who received validating responses showed lower willingness to take responsibility for their actions. Even more striking: those who received just one affirming response became more likely to believe they were right and less willing to resolve the conflict.
The preference for validation has business implications. Of study participants, 13% more said they would use the flattering chatbot again compared with those who received critical feedback-suggesting AI developers have little financial incentive to reduce sycophancy.
Users Can't Always Detect the Flattery
When researchers asked participants to rate the objectivity of both validating and critical responses, they rated them about equally. This suggests users often cannot distinguish when an AI model is being overly agreeable.
Co-lead author Dan Jurafsky, a Stanford computer science and linguistics professor, said the effect persists even when users know the AI is sycophantic. "What they are not aware of, and what surprised us, is that sycophancy is making them more self-centered, more morally dogmatic," he said.
Risks for Vulnerable Populations
Prior research has linked sycophantic AI responses to self-harm and violence in vulnerable populations. The Stanford study suggests the effects may extend broadly across all users.
Lead author Myra Cheng, a Stanford computer science PhD candidate, expressed concern about young people turning to AI to solve relationship problems. "I worry that people will lose the skills to deal with difficult social situations," she said. Cheng's recommendation: "You should not use AI as a substitute for people for these kinds of things."
Regulatory Backdrop
The study arrives as policymakers debate AI oversight. Tennessee and Oregon have passed their own AI laws. The White House last week released a framework that, if adopted by Congress, would establish national AI policy and override state regulations.
Your membership also unlocks: