OpenAI's Codex Security is out: 14 CVEs, fewer false positives, and proof-of-concept exploits

OpenAI's Codex Security scans code, builds a threat model, and validates bugs in a sandbox with PoC exploits, then suggests patches. Now in research preview with free first month.

Published on: Mar 08, 2026
OpenAI's Codex Security is out: 14 CVEs, fewer false positives, and proof-of-concept exploits

OpenAI Launches Codex Security to Find, Validate, and Patch Vulnerabilities

Published March 7, 2026

OpenAI released Codex Security on March 6 - an application security agent that scans codebases, validates real issues in sandboxed environments, and proposes patches. Formerly known as Aardvark, it spent about a year in private beta and is now a research preview for ChatGPT Pro, Enterprise, Business, and Edu customers. The first month is complimentary.

What's different about this agent

Codex Security doesn't just run static checks. It builds a project-specific threat model first, mapping what the system does, what it trusts, and where risk concentrates. Teams can edit that model so findings match their risk posture. When configured with a tailored environment, it pressure-tests suspected issues against a running system and generates proof-of-concept exploits to confirm impact.

  • Project-aware scanning via editable threat models
  • Sandbox validation with PoC exploits
  • Patch suggestions aligned to verified findings

Performance at scale

Over the past 30 days of beta testing, Codex Security scanned more than 1.2 million commits across external repositories. It surfaced 792 critical findings and 10,561 high-severity issues. Critical vulnerabilities appeared in fewer than 0.1% of scanned commits, which helps reviewers focus.

  • Noise down 84% from initial rollout to current version
  • False positives reduced by 50%+
  • Over-reported severity down by 90%+
  • User feedback on criticality refines future scans

That precision speaks to a persistent concern: a 2025 study across 80 coding tasks and 100+ LLMs found AI-generated code introduced vulnerabilities in 45% of cases. If AI writes more code, higher-signal detection becomes mandatory.

Open-source impact

OpenAI has been running Codex Security on the open-source projects it depends on. It reported high-impact findings to maintainers across OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium. The tool has earned 14 CVE designations so far, with two dual-reported alongside other researchers.

Maintainers told OpenAI the problem isn't too few reports - it's too many low-quality ones. That feedback pushed the agent toward high-confidence findings over volume to reduce triage load.

Access, pricing, and current gaps

Codex Security is a research preview delivered through the Codex web experience. There's no API-level integration yet, which may slow adoption for teams with existing automation pipelines. OpenAI hasn't disclosed post-trial pricing or which frontier model powers the agent.

The move puts OpenAI into direct competition with Snyk, Semgrep, and Veracode. Google also detailed security architecture for AI agent features in Chrome, signaling this space is getting serious attention from multiple directions.

What this means for security, IT, and development teams

If you manage production code, treat Codex Security as a high-signal second set of eyes - not a replacement for your pipeline. Use it to confirm real risk and speed response without bloating your backlog.

  • Run it in a staging or mirrored environment to validate findings with PoCs before ticketing.
  • Customize the threat model so alerts reflect your trust boundaries and business risk.
  • Wire results into code review and incident playbooks; prioritize verified criticals first.
  • Track false positives and severity drift; feed adjustments back to improve precision over time.
  • For OSS work, coordinate disclosures and patches with maintainers to minimize churn.

For open-source maintainers

OpenAI announced Codex for OSS: free ChatGPT Pro and Plus accounts, code review support, and Codex Security access for maintainers. The vLLM project has already used the tool to find and patch issues inside normal workflows. Expect broader access in the coming weeks.

Key takeaways

  • Agent-first approach: threat models before scans, then PoC-backed validation.
  • Signal over noise: fewer, higher-confidence findings that reduce triage.
  • Open questions: pricing, model details, API availability, and whether precision holds at scale.

Level up your team

If you're building or securing software with AI in the loop, strengthen your workflows and skills:


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)