Judge agents at Lloyds: GenAI that scales personalised, FCA-compliant guidance

Lloyds uses agent-as-judge AI: a generator plus independent reviewers to cut errors and meet FCA rules. Specialist models handle finance; humans review high-impact cases.

Categorized in: AI News Finance

Published on: Sep 17, 2025

Interview: Using AI agents as judges in GenAI workflows

Forty years ago, a branch manager knew your name and your story. That level of personal guidance doesn't scale. As Ranil Boteju, chief data and analytics officer at Lloyds Banking Group, puts it: most people can't afford a financial planner, and there aren't enough advisers to go around.

The bank's answer: agentic AI that can be audited, measured and kept within the guardrails of UK regulation. The goal is simple-wider access to high-quality guidance without compromising accuracy or accountability.

Why "agent-as-judge" matters for finance

Large language models can produce confident but wrong answers. In a regulated sector, that's a hard stop. Boteju's team is tackling this with an "agent-as-judge" pattern: one model generates an answer; separate models review, score and approve or reject it against clear policies and FCA expectations.

This second-line review reduces the risk of hallucinations, checks for bias, and makes decisions traceable. It doesn't replace people. "There is still very much a place for humans in the loop," Boteju says.

Specialist models beat general models for regulated use

General LLMs learn from everything on the internet. That breadth isn't always useful in finance. Lloyds opted to back a financial-services-specific model, FinLLM, developed with Aveni-trained on UK-relevant financial data to cut noise and reduce error.

The bank also wants model choice, not lock-in. An open approach to foundation models supports sovereignty and the ability to select the best tool for each task.

Real deployment: an audit assistant with checks and balances

Lloyds has tested FinLLM in Group Audit & Conduct Investigations. An audit chatbot integrates generative AI with the bank's internal Atlas documentation system to make retrieval faster and more precise.

The flow: FinLLM is tuned on audit knowledge; a generator proposes an answer; independent judge agents score it for compliance and accuracy; humans review edge cases. Outputs must align with FCA guidance and internal policy before they reach users.

How agentic AI is orchestrated

Different models handle different strengths. A hyperscaler model (e.g., ChatGPT 5 or Google Gemini) can parse what a customer actually means. FinLLM handles the regulated, domain-specific reasoning. Other agents break the request into parts and solve each piece.

Judge agents act like a second-line colleague: they verify outcomes, check rationale, reference sources, and flag anything that needs human attention.

What this means for finance leaders

Accuracy is a control problem, not just a model problem. Treat "agent-as-judge" as part of your second line of defense.
Use specialist models for regulated reasoning; use general models for intent parsing and language fluency.
Keep humans in the loop for high-impact advice, vulnerable customers, and novel scenarios.
Design for auditability: store prompts, retrieved sources, scores from judge agents, and final decisions.
Align outputs with the FCA's Consumer Duty and fair-value outcomes. Codify these as scoring rubrics for judge agents.

Practical blueprint to get started

Define your high-risk use cases and exclude them from full automation initially.
Stand up a retrieval layer that anchors responses in your approved policies, product docs, and rate cards.
Choose a domain model (e.g., FinLLM-style) for financial reasoning; use a general LLM for intent classification and summarization.
Build judge agents with clear rubrics: factuality, policy alignment, bias checks, completeness, and tone. Require a pass threshold.
Implement human review gates for advice, cross-selling, and vulnerable customer flags.
Instrument metrics: hallucination rate, judge-pass rate, override rate, time-to-resolve, and customer outcome measures.
Log everything for audit: prompts, retrieved documents, intermediate steps, judge scores, human overrides, and release approvals.
Run red-team evaluations with synthetic and historical cases, then retrain or tighten guardrails based on failure modes.

Governance checklist for CFOs, CROs and Heads of Audit

Policy mapping: tie model outputs to FCA Consumer Duty outcomes and internal conduct rules.
Model risk management: apply MRM standards to LLMs-validation, monitoring, change control, and issue remediation.
Data controls: keep PII segmented, use purpose-bound data access, and rotate secrets/keys on schedule.
Third-party risk: diversify models to avoid vendor lock-in; require transparency on training data and safety testing.
Explainability: require sources and step-by-step reasoning artifacts from both generators and judge agents.

Where this is heading

Agentic AI won't replace regulated judgment, but it can scale high-quality guidance to far more people. The pattern is clear: specialized models for domain accuracy, general models for language, independent judges for safety, and humans for accountability.

For finance teams, the advantage goes to those who ship controlled systems early, measure failure modes, and iterate behind strong governance.

Further reading
FCA Consumer Duty
PRA: Model Risk Management principles (SS1/23)

Tools and training
For a curated list of AI tools relevant to finance teams, see AI tools for Finance.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

Judge agents at Lloyds: GenAI that scales personalised, FCA-compliant guidance

Interview: Using AI agents as judges in GenAI workflows

Why "agent-as-judge" matters for finance

Specialist models beat general models for regulated use

Real deployment: an audit assistant with checks and balances

How agentic AI is orchestrated

What this means for finance leaders

Practical blueprint to get started

Governance checklist for CFOs, CROs and Heads of Audit

Where this is heading

Related AI News for Finance Professionals

MAS Consultation on AI Risk Management: What Financial Institutions Need to Know

S&P 500 Slips as Bridgewater Flags Dangerous AI Phase and Wall Street Braces for Jobs, CPI

From Pause to Performance: 2026 Is Go Time for CFOs

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company:

Judge agents at Lloyds: GenAI that scales personalised, FCA-compliant guidance

Interview: Using AI agents as judges in GenAI workflows

Why "agent-as-judge" matters for finance

Specialist models beat general models for regulated use

Real deployment: an audit assistant with checks and balances

How agentic AI is orchestrated

What this means for finance leaders

Practical blueprint to get started

Governance checklist for CFOs, CROs and Heads of Audit

Where this is heading

Related AI News for Finance Professionals

MAS Consultation on AI Risk Management: What Financial Institutions Need to Know

S&P 500 Slips as Bridgewater Flags Dangerous AI Phase and Wall Street Braces for Jobs, CPI

From Pause to Performance: 2026 Is Go Time for CFOs

BigBear.ai (BBAI) Dec 15 Pre-Market Watch: Share Vote Looms, Ask Sage Deal in Focus, Bull-Bear Tug of War