OpenAI and Anthropic Turn to Consulting as Enterprise AI Agents Fall Short

OpenAI and Anthropic are rolling up their sleeves as support bots flub IDs and tools. The wins show up after messy integration, tight guardrails, and small, measured pilots.

Categorized in: AI News Customer Support

Published on: Feb 08, 2026

AI vendors are becoming consultants because support agents keep slipping

Enterprises are learning that rolling out AI agents takes more than a few logins. OpenAI (ChatGPT) is reportedly hiring hundreds of engineers to customize models with customer data and build agents for clients - the team includes about 60 in consulting roles and 200+ in technical support. Anthropic is also working hands-on with customers instead of just shipping an app.

The reason is simple: out-of-the-box agents aren't reliable enough for production support. Retailer Fnac reportedly tested OpenAI and Google models for customer service, but agents kept mixing up serial numbers-only stabilizing after help from AI21 Labs.

Why this matters for customer support leaders

Expect services, not just software. Real value often starts with a consulting sprint to wire models into your stack and data.
Integration is the work. Agents must talk to your systems of record, apply business rules, and handle edge cases before an agent UI is even useful.
Rollouts will take longer than a typical SaaS deployment. Budget time for evaluation, guardrails, and change management.
Vendor choice now affects process design, not just price. You're buying a playbook and a team, not only a model.

Context: "Frontier" shows the hidden work

OpenAI's new agentic enterprise platform, Frontier, highlights the moving parts: connect to systems of record, encode business context, execute and optimize agents, then layer interfaces on top. That stack explains why providers are leaning into consulting-and why scale in B2B agents may be slower than the pitch decks suggest.

Tools like Claude Cowork can help, but speed-to-value depends on your connectors, policies, and data hygiene. Model gains will lift routine tasks; security and reliability risks won't vanish overnight.

Where AI agents break in support

Entity mix-ups: serial vs. order vs. ticket ID; customer "John A." vs. "John B."
Tool-call failures: missing auth, timeouts, flaky APIs, non-idempotent actions.
Partial context: agent sees the ticket but not the warranty, policy, or previous RMAs.
Edge cases: returns across channels, bundles, fraud flags, regional policies.
State management: multi-step flows without checkpoints or rollback.

A practical rollout plan for support teams

Pick one narrow use case. Example: warranty lookup and reply draft for post-purchase tickets.
Define success upfront. Target FCR, deflection, AHT variance, CSAT change, and escalation rate.
Build a gold test set. 100-300 real tickets with ground-truth answers and tool calls.
Wire to systems safely. Read-only first. Scope by team, region, and action. Add canary tenants.
Human-in-the-loop. Drafts on day one. Graduated autonomy only after stable metrics.
Guardrails that bite. PII redaction, policy snippets, tool schemas with strict validation, and output filters.
Fine-tune with your data. Start with retrieval over policies; consider supervised fine-tuning once you have labeled examples.
Instrument everything. Track tool-call accuracy, hallucination flags, correction rate, and cost per resolved ticket.
Have a rollback plan. Version prompts, prompts+tools as a bundle, and keep a one-click revert.

Data and security guardrails that prevent headlines

Least-privilege access: separate service accounts per agent capability; rotate keys.
Deterministic actions: APIs that require explicit IDs and confirmations; no free-text side effects.
Redaction and minimization: scrub PII before model calls; pass only what's needed per step.
Hallucination containment: require tool-confirmed facts for order status, payments, and identity.
Audit trails: log model prompts, responses, tool inputs/outputs, and human approvals.

Vendor evaluation checklist

Do they offer implementation engineers and playbooks for support use cases?
Proven connectors to your CRM, order system, and knowledge base? How are errors handled?
Offline evaluation tools with your test set? Support for regression testing before deploys?
Safety features: data residency, PII controls, action whitelists, and approval flows.
Reliability metrics shared weekly: tool-call success, rollback reasons, incident history.
Support model: response SLAs, on-call escalation, and who owns post-mortems.
Total cost clarity: model usage, integration time, and ongoing ops headcount.

What to pilot now (low risk, high signal)

Agent assist: suggested replies and macros grounded in policies and past resolutions.
Auto-tagging and routing: classify intent, product, sentiment, and urgency.
Case summarization: compress long threads for faster handoffs and QA.
Content QA: policy checks before sending offers, refunds, or replacements.
Controlled actions: safe, reversible steps like scheduling or FAQ links, not refunds.

The bottom line

AI agents can reduce handle time and lift consistency, but only after you wire them into your stack with tight controls. That's why OpenAI and Anthropic are acting like consultants-and why your roadmap should treat agent work as a program, not a widget. Start narrow, instrument deeply, and earn autonomy with data.

Upskill your support team

If you're building an internal playbook and need structured training for support roles, explore our AI courses by job and focused certifications for Claude and ChatGPT.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

OpenAI and Anthropic Turn to Consulting as Enterprise AI Agents Fall Short

AI vendors are becoming consultants because support agents keep slipping

Why this matters for customer support leaders

Context: "Frontier" shows the hidden work

Where AI agents break in support

A practical rollout plan for support teams

Data and security guardrails that prevent headlines

Vendor evaluation checklist

What to pilot now (low risk, high signal)

The bottom line

Related links

Upskill your support team

Related AI News for Customer Support

ElevenLabs and IBM integrate voice AI into watsonx Orchestrate for enterprise agents

Salesforce launches Agentforce Contact Center to unify AI agents, voice and CRM data

Most consumers prefer AI customer service but only 24% say their issue was fully resolved by AI, Ada study finds

ROI CX Solutions expands AI tools for contact centers to reduce agent burnout and improve response times

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: