Salesforce Study Finds AI Agents Struggle With Complex Business Tasks, Failing 65% of Multi-Turn Interactions
A Salesforce AI study finds enterprise AI agents fail 65% of multiturn CRM tasks, dropping from 58% success in single-turn tasks. Multi-turn interactions remain a major challenge.

AI Salesforce Study Finds Enterprise AI Agents Fail 65% of Multiturn Tasks
A recent benchmark study from Salesforce AI Research reveals that leading AI agents struggle significantly with complex business tasks. While these agents achieve a 58% success rate in single-turn customer relationship management (CRM) tasks, their performance drops sharply to 35% in multi-turn interactions. This gap highlights the challenges AI faces in real-world enterprise environments involving customer service, sales, and pricing workflows.
What the Study Shows
Salesforce AI Research introduced CRMArena-Pro, a benchmark designed to evaluate AI agent capabilities across 19 distinct CRM tasks. It covers both Business-to-Business (B2B) and Business-to-Consumer (B2C) scenarios, involving over 83,000 synthetic records validated by CRM professionals for realism.
The study assessed key skills such as database querying, numerical computation, information retrieval, workflow execution, and policy compliance. Among these, workflow execution was the easiest for AI agents, with success rates above 83% in single-turn tasks. However, confidentiality awareness was a major weakness, with agents showing almost no inherent understanding of sensitive information handling unless specifically prompted—though this came at the cost of task accuracy.
Performance Across Models and Tasks
- Leading AI models tested included OpenAI’s o1 and GPT-4o, Google’s Gemini-2.5-Pro and Gemini-2.5-Flash, and Meta’s LLaMA series.
- Models designed with stronger reasoning capabilities outperformed others by 12-20% in task completion.
- All models showed steep performance declines when shifting from single-turn tasks to multi-turn dialogues, often failing to obtain necessary information through clarification.
- About 45% of the failures were due to incomplete information gathering during multi-turn interactions.
Cost-efficiency analysis indicated Google’s Gemini-2.5 models offered the best balance of performance and expense. OpenAI’s o1 performed well but at a considerably higher cost, which may limit its enterprise adoption.
Implications for Customer Support and Sales Teams
For professionals working with CRM systems, these findings highlight that current AI agents are not yet reliable enough to fully automate complex, multi-step business workflows—especially those involving sensitive customer data. The trade-off between confidentiality and task success suggests caution when deploying AI for tasks that require strict data privacy.
Multi-turn interactions remain a particular challenge, with AI agents often failing to ask clarifying questions or gather all needed information. This limits their usefulness in dynamic customer support or sales scenarios where conversations naturally evolve over multiple exchanges.
What’s Next for Enterprise AI Agents?
The research points to the need for improved AI tools with better reasoning and collaborative capabilities. Approaches like “agent chaining,” where specialized AI agents work together on complex tasks, could help overcome current limitations.
Those interested in enhancing AI skills for business applications may find value in specialized courses on Complete AI Training, which offer practical insights tailored for customer support and sales roles.
Key Dates
- May 24, 2025: Research paper submitted to arXiv
- June 10, 2025: Public release of CRMArena-Pro benchmark and study findings
- June 11, 2025: Broader industry discussions highlight enterprise AI limitations
This study provides a clear snapshot of where AI stands in handling complex CRM tasks today. While AI agents show promise, there’s still a long road ahead before they can reliably support multi-turn business workflows in customer service and sales.