Multimodal AI could reduce customer effort in support by grounding responses in visual context

Text-only AI support cuts costs but shifts the burden to customers, who must decode vague answers, rephrase questions, and verify instructions before acting. Real-world failures-like Air Canada's chatbot giving wrong refund advice-show the stakes.

Categorized in: AI News Customer Support
Published on: Apr 17, 2026
Multimodal AI could reduce customer effort in support by grounding responses in visual context

AI Chatbots Are Making Customers Work Harder

Text-only AI support has cut company costs and expanded coverage. But it's shifted the burden to customers. They now parse walls of generated text, judge its accuracy, rephrase questions when the AI misses the point, and verify instructions before acting on them.

This creates what Shan Lilja, co-founder of Mavenoid, calls a "hidden tax on every AI interaction, paid by customers." The tax compounds through abandoned sessions, repeat contacts, and eroded trust.

The Hallucination Problem

The most damaging form of this tax is what Lilja terms "AI slop" - low-quality, generic content produced by AI that sounds confident but isn't always right.

The consequences are real. Air Canada paid compensation after its chatbot gave incorrect refund information. Woolworths had to adjust its chatbot after it falsely claimed to have family experiences. DPD's chatbot was convinced to swear and write disparaging poems about the company.

These aren't edge cases. An AI that tells a customer to press the wrong button can damage a product or create a safety issue.

Why Text Fails

Language is, as Lilja puts it, "a tree of possibilities." Every sentence carries multiple interpretations. When the AI picks the wrong one, the customer bears the cost of correcting it.

Visual information works differently. A photograph of a cracked component, a real-time error light, or a video showing which cable is loose constrains interpretation. Reality is harder to misrepresent than words.

This is called "visual grounding," and it's one of six properties that define effective AI for Customer Support. The others are enhanced context, reduced ambiguity, cross-modal consistency, state awareness, and real-time feedback.

The Feedback Loop Problem

In text-based support, a customer can follow instructions for 10 or 15 minutes before discovering step three was wrong. By then, they've potentially made things worse and lost confidence in the AI's next suggestion.

Real-time visual feedback prevents this. A video guide can flag immediately if a customer is cleaning a washing machine drain filter incorrectly. A live visual check on hardware installation can catch a misconnected cable before the customer powers the device back on and damages it.

Brands that recognize this can move from improving satisfaction scores to building support customers can actually rely on - where the AI's confidence is grounded in something real.

Learn more about how Generative AI and LLM systems address these accuracy and hallucination challenges.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)