AI Agents on Offense and Defense: Can Cyber Security Keep Up?

AI agents boost productivity but widen your blast radius, creating new paths for phishing, data theft and misuse. Adopt them with least privilege, guardrails, and human oversight.

Published on: Sep 19, 2025
AI Agents on Offense and Defense: Can Cyber Security Keep Up?

Are AI agents a blessing or a curse for cyber security?

AI agents are moving from demos to daily work. They schedule meetings, read inboxes, browse the web, and act across systems with your permissions. That convenience comes with risk: anything that can act for you can be tricked into acting against you.

What an AI agent really is

At its core, an agent is a generative model wired to tools. It plans steps, calls APIs, reads files, and executes tasks to reach a goal you set. Book a room, summarize a thread, push a ticket, even apply an update-no human clicks required.

As these agents gain access to email, browsers, storage, and voice, they inherit your blast radius. That's the trade-off.

Why attackers care

Attackers love anything that breaks big work into small jobs. Agentic systems do that by design. They can analyze code, search docs, stitch outputs, and keep going until the goal is met.

Add weak guardrails and broad permissions, and you get new paths for phishing, data theft, and initial access. If an agent can move files, send emails, or trigger workflows, a prompt injection can turn those features into an entry point.

Where the risk shows up

  • Prompt injection and jailbreaking: Malicious content embedded in emails, web pages, or files can steer agents to leak data or execute bad actions. See the OWASP Top 10 for LLM Apps for common classes.
  • Browser/email agents: If an agent reads mail or crawls links, an attacker can plant instructions in plain sight. The agent follows "helpful" steps and exfiltrates data.
  • Over-permissioned tools: File managers, ticketing, CI/CD, or voice features wired without least privilege create high-impact failure modes.
  • Speed-to-ship gaps: Vendors race to release features, sometimes before abuse cases and guardrails are mature.

Use agents to fight agents

Defenders can lean on the same automation. Multi-agent setups can scan threat sources, enrich signals, and prioritize action. They can pull CVEs, CISA alerts, Patch Tuesday notes, and social chatter, then draft hunts and change tickets.

This removes hours of swivel-chair work and moves teams from reactive to prepared. It won't replace people, but it will let them focus on the small set of decisions that matter.

Faster SOC workflows

  • Summarize incidents and map to similar past cases instantly.
  • Suggest next best queries, enrichment steps, and containment playbooks.
  • Suppress noise by explaining why certain leads are weak, so analysts don't chase ghosts.

Practical safeguards for teams building or adopting agents

  • Least privilege by default: Separate "read," "write," and "execute" scopes. Issue narrow, short-lived tokens. Rotate secrets automatically.
  • Tooling allowlists: Expose only vetted tools to the agent. Block file system and network access unless needed. Require signed tool calls and audit every call.
  • Content controls at the edge: Sanitize and classify inputs before the model sees them. Strip active content and block known bad domains.
  • Prompt injection defenses: Use grounded templates, input/output validators, and instruction firewalls. Treat all external content as untrusted.
  • Isolation: Run agents in sandboxes with network egress rules. Use separate tenants/projects for prod vs. experimentation.
  • Human-in-the-loop for high-impact actions: Require approvals for money movement, account changes, privilege grants, code pushes, or mass emails.
  • Telemetry first: Log prompts, tool calls, data touched, and outcomes. Stream to your SIEM. Add detections for abnormal sequences and data volumes.
  • Rate limits and kill switch: Cap actions per minute, per tool, per user. Provide a one-click stop that revokes tokens and halts jobs.
  • Vendor due diligence: Ask about model routing, data retention, training on customer data, and red-teaming practices. Demand SSO, SCIM, and granular RBAC.
  • Secure update path: For auto-patching agents, verify signatures, test in staging, and enforce maintenance windows.
  • Education: Teach staff how prompt injection looks in emails, docs, and tickets. Treat agent outputs as suggestions, not truth.

Metrics that show progress

  • Mean time to detect/respond for emergent CVEs and mass-exploitation events.
  • Patch SLAs on critical systems after agent-driven triage.
  • False positive rate and investigation time per incident.
  • Number of incidents caused by agent actions (should trend to zero).

What to do this quarter

  • 30 days: Inventory all agents and their tool permissions. Remove unused scopes. Add logging and rate limits.
  • 60 days: Pilot an analyst-assist agent in the SOC with read-only access. Measure alert triage time and quality.
  • 90 days: Roll out an automated threat intel pipeline that drafts hunts and tickets for human approval.

Where this leaves security teams

Attackers need one opening. Defenders have to cover the entire surface. Agents raise the stakes on both sides, and they're not going away.

Adopt them with intent, control their blast radius, and keep a human on the loop. That's how you get the upside without handing over the keys.

Level up team skills: If you're aligning roles and training for AI-driven operations, explore AI courses by job to close gaps fast.