UK AI Security Institute: Safeguards improving, risks persist
The UK government's AI Security Institute (AISI) has released its first Frontier AI trends report after two years of testing advanced systems across cyber security, software engineering and the sciences. The headline: safety measures are getting better, yet every model tested could still be bypassed in some way.
The report aims to replace speculation with data. It provides a public baseline for how advanced models behave and where protections are holding up - or failing - under pressure.
What the report says
Across multiple model generations, AISI red-teamers needed more time to find a universal jailbreak. That window stretched from minutes to several hours - a meaningful shift for operational risk and incident response.
Safeguards still vary widely by model. No system was immune to bypass, but the trend points in the right direction: harder to break, more effort required, fewer trivial exploits.
Government officials stress the strategy: test with developers, find weaknesses early, and fix issues before broad deployment. The goal is simple - raise safety standards as capabilities advance.
Key capability trends
- Cyber tasks: Apprentice-level tasks are now solved ~50% of the time (up from under 10% two years ago). For the first time, an AI system completed an expert task typically requiring up to a decade of human experience.
- Autonomy: The amount of work AI can do without human direction appears to be doubling roughly every eight months. Early signs of autonomous behavior were seen only in tightly controlled tests. No harmful or spontaneous actions were observed, but ongoing monitoring is needed.
- Software engineering: Many models can now complete hour-long coding tasks over 40% of the time (up from 5% in 2023).
- Biology and chemistry: Some systems outperform PhD-level scores on knowledge tests and bring advanced lab-style assistance closer to everyday users.
Implications for public sector leaders
Progress is real, but so are the gaps. Treat current safeguards as speed bumps, not walls. Plan for failure, measure drift, and require independent checks before pilots scale across critical services.
Use this report to update risk registers, procurement criteria and operational playbooks. The target is safe usefulness at scale - under audit, with guardrails, and with people in the loop.
Procurement and oversight checklist
- Evidence on the table: Require recent red-team results (including jailbreak attempts), model cards, and clear failure modes for any system under consideration.
- Metrics that matter: Track "time to universal jailbreak," false-negative/false-positive rates, and the longest task the model can run without oversight.
- Controls by default: Human-in-the-loop steps for high-impact actions; rate limits; data loss prevention; privilege separation; network egress controls for agentic tools.
- Logging and audit: Immutable logs, prompt and tool-use traces, and rapid revocation paths. Test incident response on live exercises quarterly.
- Safety updates: Contractual obligations for timely model and policy updates, plus a vulnerability disclosure process.
- Sensitive domains: For bio/chem or security use cases, require extra review, restricted tooling, and pre-approved datasets.
Operational guidance for departments
- Pilots with guardrails: Start with low-risk, high-volume workflows. Gate any autonomous features behind clear spend, time, and action limits.
- Cyber teams: Let models handle apprentice-level tasks (triage, enrichment, routine checks) with human review. Measure precision, not just throughput.
- Change control: Treat model version shifts like major software releases. Re-run evaluations after every significant update.
- Data hygiene: Minimise personal data in prompts, use redaction where possible, and sandbox sensitive information.
- Public-facing services: Abuse detection, content filtering, and clear escalation paths for users and staff.
Why this matters for policy
The report is not a policy blueprint; it is a shared evidence base. Government, industry and researchers will need to co-invest in evaluation so deployment keeps pace with capability jumps.
As Jade Leung of AISI notes, independent testing is the anchor as systems advance. That shared discipline is how we deliver growth, better services and public trust at the same time.
Where to learn more
- UK AI Safety Institute - official updates, evaluations and publications.
- NCSC: Guidelines for secure AI system development - practical engineering guidance for teams.
Upskill your team
If you're building capability inside a department or agency, a structured path helps. See role-based options here: Complete AI Training - courses by job.
Your membership also unlocks: