Alaska's probate chatbot shows how hard "accurate enough" really is
Probate is paperwork under stress. If the advice is wrong, families pay for it. Alaska's courts spent more than a year building an AI probate assistant (AVA) to help with forms and procedure. What started as a three-month build turned into a long slog of false starts, model tweaks, and hard lessons.
The takeaway for legal teams: accuracy beats speed. In high-stakes contexts, a friendly answer that's 95% right can still cause harm. That's why the AVA team insisted on higher standards than a typical tech rollout.
Accuracy over convenience
Court leadership pushed for information users can act on without getting burned. "We need to be 100% accurate," said Alaska Court System administrative director Stacey Marz. Good enough wasn't good enough; incomplete guidance can derail an estate, waste months, and create liability exposure.
That mindset shifted the project from "ship fast" to "ship carefully." It slowed everything down, but it was the only acceptable path.
Persona tuning: less comfort, more clarity
Early AVA versions offered condolences. User testing said: stop. People in grief didn't want one more "sorry for your loss." They wanted direct answers and steps.
The team stripped out sympathy lines, tightened tone, and focused on plain-language explanations. Friendly, yes. But brief and precise beats warm and wordy.
Hallucinations forced tighter controls
Despite retrieval constraints, AVA still made things up. One example: suggesting a non-existent Alaska law school as a resource. That pushed the team to aggressively fence the model into court-approved materials and disable any open web reach.
Industry models are getting better, but not enough to trust without guardrails. The fix wasn't just "pick a smarter model." It was "limit the model to documents we trust, and make it cite them."
Testing that humans can actually maintain
The team wrote a 91-question accuracy test spanning forms, procedures, and common user paths. It was too heavy to run and review regularly. They shrank it to 16 high-signal questions: known past misses, tricky edge cases, and likely high-volume asks.
That smaller suite made routine regression checks realistic. Without sustainable testing, quality drifts the moment models shift.
Cheap tokens, expensive oversight
On paper, the costs look great: roughly 11 cents for 20 queries under one setup. The real cost shows up in supervision. Model versions change, behaviors drift, prompts decay. Someone has to watch the store.
The plan now includes regular checkups, prompt updates, and model swaps as providers retire versions. This isn't "set it and forget it." It's ongoing operations work.
Launch, with recalibrated expectations
AVA is slated to go live in late January. The goal has shifted. The team wanted to match what human facilitators provide. Today, they're aiming for reliable self-help on defined topics, with guardrails and a clear boundary: guidance, not advice.
The labor was heavy. The hype about generative AI glosses over how much human effort it takes to make a legal assistant both helpful and safe.
Practical guidance for courts and legal teams building AI assistants
- Scope narrowly: Limit to process and forms. No legal advice, no fact-specific interpretations. Add visible disclaimers and "this is not legal advice" prompts.
- Retrieval first, not web search: Ground every answer in approved materials. Require citations to specific sections of court guides and forms. If unsure, the bot should refuse and route to a human.
- Hallucination controls: Use low temperature, strict system prompts, and a hard "don't guess" rule. Prefer short, cited answers over long narratives.
- Human escalation: Provide an easy handoff to staff or a callback request for edge cases, missing documents, or multi-issue questions.
- Evaluation you can run weekly: Keep a compact, high-value test set (10-25 questions). Include past failures, edge cases, and top FAQs. Track pass/fail with simple rubrics and log samples.
- Change management: Pin model versions. Document prompts. Re-validate on every model update. Keep a rollback plan and release notes.
- Tone and UX: Use plain English, step-by-step outputs, and links to exact forms. Skip condolences and filler. Offer "teach me more" and "just give me the form" paths.
- Compliance and risk: Address unauthorized practice of law risks, privacy, and retention. Don't store sensitive facts in vendor logs. Add visible scope limits and consent notices.
- Security and data: Use provider options that disable training on your data. Minimize PII. Set retention to the shortest practical window.
- Cost control: Cache frequent answers, rate-limit to keep spend predictable, and monitor outliers. Optimize context windows to reduce token waste.
- Governance: Appoint an owner for policy, an owner for prompts, and an owner for evaluation. Meet monthly. Review metrics, incidents, and change requests.
Resources
- NIST AI Risk Management Framework - practical structure for risk, controls, and governance.
- National Center for State Courts - guidance, toolkits, and court tech resources.
Want structured upskilling for legal teams?
If you're standing up an AI assistant and need repeatable prompts, evaluation templates, and governance checklists, see our courses by job: Complete AI Training.
Your membership also unlocks: