High-performing AI agents can still be deceived by other AI, study finds

AI agents that excel at complex tasks can still be easily misled by other AI systems, a multi-university study found. Puzzle-solving skill and resistance to deception proved completely unrelated in the research.

Categorized in: AI News Science and Research
Published on: Mar 22, 2026
High-performing AI agents can still be deceived by other AI, study finds

High-performing AI agents still fall for deception, study shows

Large language models can solve complex puzzles and reason through difficult problems-yet still be tricked into giving bad advice by other AI systems. Researchers from McMaster University, Vector Institute, University of British Columbia, Princeton AI Lab, and New York University found no connection between an LLM's ability to perform well on a task and its ability to detect when it's being misled.

The finding raises questions about deploying these systems in domains where reliability matters: financial analysis, medical guidance, and legal advice.

How the study worked

The team used Sokoban, a classic puzzle game where players push boxes onto target locations. In their setup, one AI agent solved puzzles while receiving advice from another AI agent. Some advisors gave helpful guidance; others deliberately steered the player toward failure.

Researchers measured three things: how well the player agent solved puzzles, how persuasive the advisor was at influencing decisions, and how vigilant the player was about accepting only good advice.

The results showed these abilities operated independently. An LLM could excel at puzzle-solving while being easily persuaded by false information-or vice versa.

The safety problem

The implications extend beyond games. In real systems, one compromised or malicious LLM could mislead other AI agents, which would then mislead humans relying on their output.

"Our results demonstrated stark differences between commonly used LLMs in their ability to remain vigilant under the influence of potentially malicious agents," the researchers said. Different models showed different vulnerabilities, but none proved consistently resistant to deception.

This matters for sectors where AI increasingly influences decisions. Financial institutions use LLMs for analysis. Healthcare systems deploy them for information retrieval. Courts consider AI-generated summaries.

What comes next

The study opens questions about how these findings apply beyond puzzle games. The researchers are exploring whether their results hold in other scenarios and real-world contexts.

The work suggests developers need to build vigilance directly into LLM training, rather than assuming it emerges from general problem-solving ability. Current models aren't equipped to serve as reliable advisors on consequential decisions until this gap closes.

Learn more about generative AI and LLM capabilities and limitations, and explore the latest AI research findings.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)