VLAIR Aftermath: AI Outscores Lawyers, Big Vendors Blink

Vals' new study found AI beat human lawyers on accuracy, sources, and clarity in legal research. Expect leaner bills, standardized workflows, and tougher demands on vendors.

Categorized in: AI News Legal
Published on: Oct 21, 2025
VLAIR Aftermath: AI Outscores Lawyers, Big Vendors Blink

Vals Legal AI Research Eval: The Aftermath and What It Means for Legal Teams

Vals has released its latest Legal AI Report, focused on legal research. It grabbed attention for two reasons: key vendors didn't fully participate, and AI outperformed human lawyers across the board.

If you lead a practice or run a research team, this one matters. It's not hype. It's a signal.

What Vals Tested

Vals evaluates AI systems across multiple domains. This study zeroed in on legal research using 200 questions typically seen in private practice.

Participants included Alexi, Counsel Stack, Midpage, and ChatGPT. At least one major vendor participated but chose not to make results public.

Method matters: responses were collected in July 2025 via API using zero-shot prompts. No follow-ups. No prompt engineering. Human lawyers, sourced from a US law firm, answered the same questions using non-generative tools and standard research databases.

The Headline: AI Beat Human Lawyers

Across three weighted criteria, AI led on every front:

  • Accuracy (50%): Correct, with no incorrect elements
  • Authoritativeness (40%): Supported by valid primary/secondary sources
  • Appropriateness (10%): Clear, shareable with clients/colleagues

Yes, even the general model performed strongly. On pure Q&A legal research, AI was simply faster and cleaner.

Read the Fine Print

This wasn't a test of end-to-end lawyering. No client nuance, no strategy, no court filings, no adversarial pressure. It was research-only.

Human lawyers weren't incentivized to grind through every question with partner-level scrutiny. Give a team time, pressure, and feedback loops and human scores would likely rise.

Also note: July 2025 to publication is a long stretch for models that ship updates frequently. Everyone improves during that window.

Why the Big Vendors Sat Out

Some large brands argued the lag between testing and publication makes numbers stale. Others say task selection favors certain approaches. The simple truth: the bigger you are, the more you risk by publishing a second-place score-no matter how small the gap.

Should they participate? Yes. Evals are most useful when they include the tools lawyers actually use daily, not just the newer entrants.

What This Means for Your Stack

Could a firm use ChatGPT for research? Yes. The results show it can hold its own on accuracy. That said, legal-specific platforms still bring the application layer: domain-tuned prompts, RAG pipelines, digestible UI, workflow controls, audit logs, and DMS integrations. Those are not trivial.

So no, this doesn't replace lawyers or even your platform. It changes the price-performance expectations for research and puts pressure on vendors to deliver more than "point and answer."

Client and Practice Implications

  • Billing: If AI handles first-pass research well, clients will expect leaner bills and clearer justification for manual hours.
  • Process: Standardize how your team prompts, cites, and verifies. Document the workflow.
  • Policies: Require source verification and chain-of-custody for citations. Log prompts and outputs.
  • People: Train associates to critique AI outputs, not just copy them. Reward good judgment over raw volume.
  • Vendors: Ask for eval data, audit trails, source transparency, and controls. Push for measurable gains over generic claims.

Quotes That Matter

Rayan Krishnan (Vals): "General models can do legal research well. There are still gaps in authoritative sourcing, but the gap may narrow."

Tara Waters: "I wasn't surprised by ChatGPT's accuracy. Sourcing isn't yet optimized for proper legal sources first-likely fixable. Also, we didn't test 'research and prepare for court filings,' which is where public chatbots get into trouble."

Do We Have the Full Picture?

No single eval can capture the full job of a lawyer. But benchmarks help. They keep vendors honest and help buyers compare options on something more concrete than demos.

We need a basket of evals across tasks, jurisdictions, and workflows. If you want a model for how broader testing frameworks can work, see Stanford's HELM. For risk and governance framing, NIST's guidance is a useful anchor: AI Risk Management Framework.

What to Do Next

  • Pilot two tools: one legal-specific platform and one general model with guardrails. Compare outputs, speed, and cost per matter.
  • Set a "cite-or-flag" rule: no output goes to a client without validated primary sources.
  • Reprice research: fixed fees or caps where AI handles the heavy lift.
  • Track outcomes: accuracy, time saved, and client feedback. Use that data in vendor renewals.
  • Train your team: prompt patterns, verification, and ethical use.

Training Your Team

If you're formalizing AI skills across roles, you can find structured programs here:

Bottom Line

AI is strong at legal research. That's good news. Use it to reduce cost, improve speed, and re-focus human time on strategy, client counsel, and judgment. Keep pushing vendors for verifiable accuracy and transparent sourcing-and keep your lawyers in the loop where it matters.

The full report is available via Vals' main website.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)