US-UK Red Teaming Exposes AI Agent Hijacks and Universal Jailbreaks at OpenAI and Anthropic

US and UK labs probed OpenAI and Anthropic, exposing agent hijacks, prompt injection, and guardrail gaps. Agencies need red-team access, context security, and incident SLAs.

Categorized in: AI News Government

Published on: Sep 16, 2025

US and UK researchers quietly stress-tested commercial AI: what government teams need to know

OpenAI and Anthropic spent the past year giving U.S. and U.K. government labs deep access to their systems. The goal: probe for failure modes that criminals, foreign intelligence, or insiders could exploit.

According to the companies, researchers at NIST's U.S. Center for AI Standards for Innovation and the U.K. AI Security Institute tested models, classifiers, and even guardrail-free prototypes. The focus was abuse resistance in high-risk domains and how easily agents can be hijacked via context poisoning and prompt injection.

What the testing covered

OpenAI: Evaluations of ChatGPT and newer agent products across cyber and chemical-biological risk areas. Work expanded to red-teaming agent tooling and new pipelines to find and fix vulnerabilities with external evaluators.
Anthropic: Ongoing access to Claude models and a classifier used to detect jailbreaks. Testing targeted prompt injections, hidden instructions in context, and universal jailbreak methods.

Key findings government leaders should internalize

Compound vulnerabilities matter: OpenAI reports NIST surfaced two novel issues that, chained with a known AI hijacking technique, let testers take over another user's agent about 50% of the time, potentially controlling the agent's accessible computer session and impersonating the user on logged-in sites.
Agent context is a critical attack surface: Multiple exploit paths relied on poisoning the data the model or agent uses to decide actions, not breaking the base model weights.
Guardrail bypasses evolve: Anthropic says a universal jailbreak technique slipped past standard detection, prompting an overhaul of their safeguard architecture rather than a simple patch.
Security maturity is improving: Independent researchers report newer commercial models are harder to jailbreak than earlier releases. Coding models and some open-source systems, however, remain easier to steer into unsafe outputs.

Why this matters for public-sector programs

AI use in government is shifting from prototypes to production systems that touch sensitive data and mission workflows. The findings show that:

Attackers target the context layer (files, tools, browsing, APIs) more than the base model.
Red-teaming access-to agents, tools, and safety filters-is required to see real risk, not just demo risk.
Point fixes age fast; vendors need architecture-level responses and continuous evaluation cycles.

Immediate actions for agencies and programs

Adopt a standard: Map AI projects to the NIST AI Risk Management Framework and require vendors to show alignment in documentation and testing. NIST AI RMF
Contract for red-team access: Bake into SOWs the right to conduct or commission independent red-teaming against agents, tools, retrieval systems, and guardrails, including access to non-production builds and evaluation APIs.
Demand evaluation artifacts: Require structured reports on jailbreak resistance, prompt injection defenses, bio/cyber misuse tests, and incident postmortems with remediation timelines.
Secure the context layer: Gate agent tool use with allowlists, sandbox execution, strong auth, scoped tokens, and egress controls. Treat RAG sources and plugins as high-trust dependencies.
Set incident SLAs: Define vendor obligations for vuln disclosure, temporary mitigations, model or guardrail rollbacks, and notification windows.
Threat-model agents: Include impersonation, session hijack, and lateral movement objectives in tabletop exercises and penetration tests.
Train users and builders: Teach staff to spot data-poisoning and prompt-injection patterns. Provide safe prompting norms and approval flows for new tools and data connectors.

Policy signals vs. on-the-ground work

Some leaders have deprioritized public messaging on AI safety, and both U.S. and U.K. institutes dropped "safety" from their names. Despite that, the technical collaborations show a steady push to test and harden models in areas that intersect with national security, infrastructure, and public services.

What to ask vendors now

What agent-level red-team results can you share from the past 90 days? What changed because of those findings?
How do you detect and block prompt injection and context poisoning across RAG, tools, and browsing?
Do you support sandboxed execution with clear permissioning, audit logs, and kill switches for agent actions?
Can you provide guardrail-free test builds in a controlled environment for government evaluators?
What is your vulnerability disclosure policy and rollback plan for faulty safeguards or models?
How are non-production prototypes and safety filters validated before they reach mission environments?

Additional resources

Upskill your team

If you're building AI-enabled services or running evaluations, structured training helps standardize safety practices across program, security, and acquisition teams. See curated options here: AI certifications and training.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

US-UK Red Teaming Exposes AI Agent Hijacks and Universal Jailbreaks at OpenAI and Anthropic

US and UK researchers quietly stress-tested commercial AI: what government teams need to know

What the testing covered

Key findings government leaders should internalize

Why this matters for public-sector programs

Immediate actions for agencies and programs

Policy signals vs. on-the-ground work

What to ask vendors now

Additional resources

Upskill your team

Related AI News for people in Government

NVIDIA unveils AI Factory for Government reference design to secure and modernize federal AI

Super Micro Computer launches federal AI unit to deliver U.S.-made servers and data center solutions

Indonesia pushes back AI roadmap to early 2026, ethics first amid growing pressure to move faster

Applications open for Civil Service AI and Data Challenge delivering national impact and career-defining opportunities

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: