Frontier AI: the UK's clearest evidence to date on capability and safety
Published: 18 December 2025
The AI Security Institute (AISI) has released its inaugural Frontier AI Trends Report-an evidence-first view of what the most advanced AI systems can actually do. After two years of testing across cyber security, software engineering, biology, and chemistry, the data shows meaningful progress on protections and capability. Systems are still bypassable, but the bar is rising: the average time to find a "universal jailbreak" has stretched from minutes to several hours across model generations-about a 40x slowdown for attackers.
This is not a policy prescription. It's a clear baseline that departments, agencies, and regulators can use to make better calls on procurement, assurance, skills, and oversight as AI adoption grows.
Why this matters for government
We now have public, test-based numbers instead of speculation. The report will be a regular publication, giving departments a common reference point to compare models, track progress, and pressure-test safeguards before systems touch critical services.
Key findings
- Cyber security: Success on apprentice-level tasks rose from under 9% in 2023 to around 50% in 2025. For the first time, a model completed an expert-level task typically requiring up to 10 years of experience.
- Software engineering: Models now complete hour-long engineering tasks more than 40% of the time (up from below 5% two years ago).
- Biology and chemistry: Systems outperform PhD-level researchers on scientific knowledge tests and can help non-experts carry out lab work that used to be out of reach.
- Pace of change: The duration of some cyber tasks models can finish without human direction is roughly doubling every eight months.
- Safeguards: Protections are improving and vary by company. Every system tested remains vulnerable to some form of bypass, but "universal jailbreaks" are getting harder to find-now taking hours, not minutes, for AISI red-teamers.
- Autonomy signals: Early signs appear in controlled experiments only. No models showed harmful or spontaneous behavior in AISI tests. Continued tracking is essential as capability increases.
What AISI is doing
Created in 2023, AISI is the UK's state-backed hub for evaluating frontier AI. Its testing team-the largest of any government-backed AI body-works directly with major developers to identify and fix vulnerabilities before systems are widely deployed. This collaboration, paired with ongoing investment in evaluation and AI science, aims to convert capability into secure growth, better services, and new jobs.
What leaders are saying
AI Minister Kanishka Narayan emphasized that the UK is stress-testing leading systems and fixing weaknesses before they reach scale, building scientific capability inside government and putting evidence ahead of hype to deliver growth and better public services while keeping trust and safety central.
Jade Leung, the Prime Minister's AI Adviser and AISI's Chief Technology Officer, highlighted that the report provides some of the strongest public evidence from a government body on how quickly frontier AI is advancing-and underscored the value of independent evaluation to match that pace.
What this is-and is not
- It is: Controlled tests that measure concrete capabilities relevant to safety, security, innovation, and growth.
- It is not: A forecast of the future or a direct read on real-world risk today. Treat it as a critical input to decisions, not the only input.
Actions for departments now
- Require third-party evaluation results and documented safety measures at procurement and assurance gates.
- Set clear expectations for model red-teaming, monitoring, and incident response-consistent with independent testing like AISI's.
- Coordinate with AISI on pre-deployment testing and responsible disclosure for vulnerabilities discovered in pilots or live services.
- Track autonomy-related signals in your risk registers and update mitigations as capability metrics move.
- Upskill policy, security, and service teams to interpret AI evaluations and deploy safely. See curated options by role at Complete AI Training.
Learn more
For a complementary reference on evaluation and risk controls, see the NIST AI Risk Management Framework.
Your membership also unlocks: