AI Agents Move From Experiment to Essential: What Ops Leaders Need to Do Now
AI has crossed the threshold from trial to critical infrastructure. In a new survey of 1,500 IT and business executives across Australia, France, Germany, Japan, the U.K., and the U.S., PagerDuty reports that 74% say their company would struggle to function without AI. Trust is rising too: 81% would let AI agents take action during a crisis like a service outage or security event.
Agentic AI is spreading fast. Three out of four companies have deployed more than one AI agent, and one in four already run five or more.
Key Signals for Operations
- Trust in crisis: 81% trust AI agents to act during outages or security events.
- AI is essential: 74% say their company would struggle without AI; this rises to 77% for companies under 10,000 employees. Among C-suites and owners, it's 83% vs. 73% for directors and VPs.
- Multiple agents: 75% have deployed more than one AI agent; 25% have five or more.
- Engineering adoption: 84% use AI to write, review, or suggest code. That jumps to 91% for companies with multiple agents vs. 68% with one agent and 44% with none.
- Testing gap: 85% test AI-generated code, but only 39% do so consistently with formal processes. The U.S. leads at 59%; Japan trails at 19%.
- Guardrails behind adoption: 85% say they need better procedures to detect AI errors or failures (highest in France at 90%).
- AI-related outages are common: 84% have experienced at least one. Of those without an outage, 57% already have protocols ready.
- Complexity is catching teams off guard: 76% with one agent, and 79% with multiple agents, believe AI complexity will outpace staffing. Only 57% without agents see it coming.
Why Confidence Is Increasing
- Better outputs (49%)
- More frequent usage with positive results (48%)
- Improved understanding (47%)
- Stronger oversight measures (45%)
The Real Risk Picture
Adoption is outpacing governance. Most companies have felt an AI outage. Many are shipping AI-assisted code without consistent, formal testing. And as teams add more agents, operational complexity grows faster than headcount.
This is the moment to treat AI agents as production systems with clear owners, SLOs, runbooks, and audits.
What to Implement This Quarter
- Define crisis actions: List which actions agents can perform autonomously vs. those requiring human approval. Add clear RACI and escalation thresholds.
- Formalize testing for AI-generated code: Apply unit, integration, security, and policy checks. Add prompt/response test cases and red teaming for edge cases. Gate merges with automated checks.
- AI incident response: Create runbooks for detection, isolation, rollback, and safe fallbacks to manual flows. Track MTTD/MTTR for AI-specific incidents.
- Observability and auditability: Log prompts, model versions, responses, and decisions with PII redaction. Keep audit trails and change history.
- Guardrails: Add policy enforcement, content filters, rate limiting, canary deploys, and a quick kill switch. Monitor for drift and abnormal behavior.
- Model and prompt versioning: Version models, embeddings, prompts, and tools. Require change approvals and rollback paths.
- Resilience planning: Map dependencies (model providers, vector stores, feature services). Create provider failover, cached results, and degradation modes. Run chaos drills for AI service loss.
- Security and privacy: Enforce least privilege for agents and tools, secrets isolation, and egress controls. Scan AI packages and templates for supply-chain risk.
- Cost controls: Set token budgets, cost alerts, and unit economics per workflow. Tag usage to teams and services.
- Service catalog for agents: List every agent with owner, scope, SLO, interfaces, and runbooks. Review quarterly.
- Upskill your team: Train ops, SRE, and on-call roles on agent behavior, failure modes, and safe rollout practices.
What This Means for Your Roadmap
AI is now core infrastructure. Treat agents like any production service: reliability first, clear controls, measurable outcomes. Start with the high-impact, low-risk workflows, add guardrails early, and scale only after you can test, monitor, and recover fast.
As PagerDuty's David Williams notes, "Companies that embed automation and agents into their operations will see AI drive efficiency, reduce costs, and strengthen customer trust." The data suggests most leaders agree-and are acting on it.
Get the Full Findings
Read the complete release and methodology here: PagerDuty AI Resilience Survey.
For a practical framework on AI governance and risk, see NIST's AI Risk Management Framework.
If you're building operations skills and standards for AI agents, explore role-based learning paths: AI courses by job.
Your membership also unlocks: