GPT-5.5 releases with unverified safety stack as government evaluators flag jailbreak concerns

UK evaluators found a jailbreak in GPT-5.5 within six hours, but couldn't confirm OpenAI's fix actually worked. No independent body has authority to verify safety claims before public release.

Categorized in: AI News Government
Published on: Apr 25, 2026
GPT-5.5 releases with unverified safety stack as government evaluators flag jailbreak concerns

Government Can't Verify AI Safety. That's a Problem.

OpenAI released GPT-5.5 this week with a critical gap in oversight: the UK's AI Security Institute found a jailbreak that defeats the model's safeguards in six hours of testing, but couldn't verify whether OpenAI's fixes actually work. The company says the issue is resolved. No independent evaluator confirmed it.

GPT-5.5 is the most capable model available to the public for cyberattacks. The AISI tested it on a 32-step corporate network attack that would take an expert 20 hours. The model completed it. Unlike Anthropic, which keeps its most dangerous model restricted, OpenAI is letting anyone access GPT-5.5 through a "Trusted Access" program for vetted users and safety guardrails for everyone else.

The problem: third-party evaluators like AISI lack the access needed to confirm those guardrails work after OpenAI patches them. Government agencies rely on these evaluations to understand risks. They're getting incomplete information.

Why This Matters for Government

Frontier AI labs now make the final call on what to release and when. They invite evaluation. They don't have to act on the results. If OpenAI decides its safeguards are sufficient, there's no mechanism to override that decision-even if independent experts disagree.

This arrangement worked when AI posed theoretical risks. It doesn't work now. These models affect national security, critical infrastructure, and public safety. Government agencies evaluating AI need real verification, not corporate assurances.

Anthropic faces the same incentive structure. The Mythos leak this week showed its security practices aren't preventing access to dangerous capabilities. If Anthropic chose to release Mythos publicly tomorrow, nothing would stop it.

What Government Agencies Should Know

Federal evaluators-including the Center for AI Standards and Innovation (CAISI), which will be led by Chris Fall, formerly of the Department of Energy-have limited authority. They can test models before release. They cannot require changes. They cannot block deployment.

Multiple agencies are already using Anthropic's Mythos model despite the Pentagon labeling the company a supply chain risk. The NSA reportedly has access. CISA does not. This inconsistency suggests no coordinated government policy on frontier model access exists.

Meanwhile, Congress is debating AI regulation without resolving this core issue: who verifies that a model is safe before millions of people can use it?

The Evaluation Problem

The AISI tested GPT-5.5's cyber capabilities and safeguards separately. It found the model highly capable. It found a universal jailbreak. Then OpenAI patched the jailbreak. The AISI couldn't run final verification tests to confirm the patch worked.

That's the broken state of government evals. Evaluators can identify problems. They can't confirm solutions. Companies control access to final versions.

Rep. Jay Obernolte is drafting comprehensive federal AI regulation. Rep. Sam Liccardo said he won't co-sponsor it because it lacks "critical requirements" for safety. That disagreement reflects a broader question: what does government-led AI oversight actually look like when companies retain control?

For AI in cybersecurity applications, the stakes are especially high. GPT-5.5 can execute attacks that currently require human expertise. Government needs certainty, not trust, that safeguards work.

What Happens Next

OpenAI claims its updated safeguards block all verified high-severity cyber jailbreaks. The company's own red-teaming confirmed this. An independent expert found a jailbreak in six hours.

Neither OpenAI nor Anthropic should be grading their own homework on safety. Government agencies need binding authority to verify models before release, not advisory power that companies can ignore.

GPT-5.5 might be totally safe. It also might not be. That uncertainty shouldn't depend on a private company's word.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)