Multimodal AI could reduce customer effort in support by grounding responses in visual context

Text-only AI support cuts costs but shifts the burden to customers, who must decode vague answers, rephrase questions, and verify instructions before acting. Real-world failures-like Air Canada's chatbot giving wrong refund advice-show the stakes.

Categorized in: AI News Customer Support

Published on: Apr 17, 2026

AI Chatbots Are Making Customers Work Harder

Text-only AI support has cut company costs and expanded coverage. But it's shifted the burden to customers. They now parse walls of generated text, judge its accuracy, rephrase questions when the AI misses the point, and verify instructions before acting on them.

This creates what Shan Lilja, co-founder of Mavenoid, calls a "hidden tax on every AI interaction, paid by customers." The tax compounds through abandoned sessions, repeat contacts, and eroded trust.

The Hallucination Problem

The most damaging form of this tax is what Lilja terms "AI slop" - low-quality, generic content produced by AI that sounds confident but isn't always right.

The consequences are real. Air Canada paid compensation after its chatbot gave incorrect refund information. Woolworths had to adjust its chatbot after it falsely claimed to have family experiences. DPD's chatbot was convinced to swear and write disparaging poems about the company.

These aren't edge cases. An AI that tells a customer to press the wrong button can damage a product or create a safety issue.

Why Text Fails

Language is, as Lilja puts it, "a tree of possibilities." Every sentence carries multiple interpretations. When the AI picks the wrong one, the customer bears the cost of correcting it.

Visual information works differently. A photograph of a cracked component, a real-time error light, or a video showing which cable is loose constrains interpretation. Reality is harder to misrepresent than words.

This is called "visual grounding," and it's one of six properties that define effective AI for Customer Support. The others are enhanced context, reduced ambiguity, cross-modal consistency, state awareness, and real-time feedback.

The Feedback Loop Problem

In text-based support, a customer can follow instructions for 10 or 15 minutes before discovering step three was wrong. By then, they've potentially made things worse and lost confidence in the AI's next suggestion.

Real-time visual feedback prevents this. A video guide can flag immediately if a customer is cleaning a washing machine drain filter incorrectly. A live visual check on hardware installation can catch a misconnected cable before the customer powers the device back on and damages it.

Brands that recognize this can move from improving satisfaction scores to building support customers can actually rely on - where the AI's confidence is grounded in something real.

Learn more about how Generative AI and LLM systems address these accuracy and hallucination challenges.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Multimodal AI could reduce customer effort in support by grounding responses in visual context

AI Chatbots Are Making Customers Work Harder

The Hallucination Problem

Why Text Fails

The Feedback Loop Problem

Related AI News for Customer Support

Despite AI adoption, offshore call center employment keeps rising, economist finds

PLDT Home rolls out AI customer support platform across sales and service centers nationwide

Kore AI positions itself as governance and orchestration infrastructure for enterprise AI agents

Most CEOs plan to reduce junior hiring as AI automates routine tasks, survey finds

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: