Fix the humans before scaling AI at the CRA, experts warn
OTTAWA - A new report from the Auditor General raised a red flag: the Canada Revenue Agency is missing calls, and when it answers, fewer than one in five callers received accurate help on personal income taxes. That same review found the CRA's rule-based chatbot, Charlie, was right about one-third of the time. The agency is now piloting a generative AI chatbot, extending hours, and broadening what the bots can answer.
Experts say that's the wrong order. If human service is unreliable, AI trained on that system will amplify the problem.
What the Auditor General found
The audit points to two issues: access and accuracy. Too many calls aren't answered, and agents struggle to provide correct guidance when they do connect.
Charlie, the scripted chatbot, performs better on simple questions but still misses the mark often. A pilot using generative AI is underway, with longer run times and more topics on the roadmap. The risk is clear: scaling weak processes with automation spreads weak outcomes faster. For context on oversight expectations, see the federal Directive on Automated Decision-Making and the Office of the Auditor General's resources at oag-bvg.gc.ca.
What experts say will work
Anatoliy Gruzd's view is blunt: fix the people and process first. If agents don't have the right answers, chatbots won't either.
Adegboyega Ojo recommends a mixed model: let machines handle repeatable FAQs, then hand off nuanced cases to trained specialists. Keep human oversight in the loop to maintain context and accountability.
Jasmin Manseau draws the line between prediction and judgment. Routine tasks can be automated; judgment-heavy calls should remain with people. Public trust depends on accuracy reaching a reliable threshold-think 70 to 90 percent-before expecting broad adoption.
What government leaders can do now
- Stabilize the human baseline: Build a single source of truth for tax guidance. Give agents updated playbooks and decision trees. Align chatbot scripts to the same source.
- Publish accuracy and accessibility metrics: Track first-contact resolution, accuracy by topic, escalation rates, and wait times. Share targets and progress.
- Design smart triage: Use AI for eligibility checks, glossary-level definitions, and deadline reminders. Auto-route edge cases and multi-rule scenarios to specialists.
- Tight escalation paths: Make handoffs seamless: transcript, caller context, and prior steps should follow the case. Measure time-to-human and resolution quality.
- Close the feedback loop: Every incorrect answer-human or bot-feeds a correction cycle. Update playbooks, retrain agents, and rescript the bot within set SLAs.
- Guardrails first, then scale: Apply impact assessments, bias tests, and human-in-the-loop controls before expanding hours or scope. Map controls to the TBS Directive levels.
- Test like you mean it: Run A/B tests on scripts and flows. Red-team the chatbot with tricky edge cases. Validate against real call logs before production.
- Reduce demand at the source: Rewrite high-friction letters and web pages that trigger calls. Add plain-language calculators, checklists, and status tools to cut avoidable volume.
- Train for judgment: Focus agent training on interpretation, exceptions, and policy changes. Let AI handle lookups; let people handle consequences.
Bottom line
AI can speed up repeatable work. Trust is earned on the hard calls. Get human accuracy and processes right, then automate the right slices with tight oversight.
If your team is standing up human-AI operations and needs practical upskilling paths, see curated options by role at Complete AI Training.
Your membership also unlocks: