DT activates RAN Guardian AI to predict event surges and auto-optimize its RAN

Deutsche Telekom's AI RAN Guardian, built with Google Cloud, is live-watching performance, helping troubleshoot, and tuning in minutes. Ops gain faster fixes and safer automation.

Categorized in: AI News Operations
Published on: Nov 12, 2025
DT activates RAN Guardian AI to predict event surges and auto-optimize its RAN

DT's RAN Guardian Agent Goes Live: What Operations Teams Should Know

Deutsche Telekom has activated its AI-led RAN Guardian Agent on the live network, built in collaboration with Google Cloud. The system monitors mobile network performance, assists in troubleshooting, and executes optimizations during network events and exceptional situations.

It uses a multi-agent approach with large language models and reasoning frameworks to identify anomalies and trigger corrective actions. DT says workflows that took roughly an hour are now handled in minutes.

How the Agent Operates

  • Event sensing: An initial agent scans public sources like directories and social media to spot upcoming events across Germany, estimate crowd size, and map locations.
  • Capacity assessment: Another agent evaluates nearby antenna capacity, tracks live network parameters, flags high utilization, and recommends optimizations.
  • Execution: A third agent applies changes, such as reallocating resources or adjusting configurations, to stabilize performance.

The agents can also validate one another's outputs, reflecting DT's view that traditional automation is "insufficient for real-time problem solving."

Leadership's Take

"With the introduction of the RAN Guardian Agent, we are the first network operator to rely on a highly developed AI agent in network management," said Abdu Mudesir, Board member for Product and Technology at Deutsche Telekom. "With an intelligent interaction between our network experts and AI, we are solving specific challenges for the benefit of our customers - for the best network. And we are taking a big step towards autonomous, self-healing networks."

Why It Matters for Operations

  • Faster response: Quicker time-to-detect and time-to-mitigate during spikes and incidents.
  • Event readiness: Proactive planning for concerts, sports, and seasonal surges.
  • Closed-loop actions: Moves from recommend-only to execute-with-guardrails.
  • Reduced toil: Fewer manual escalations and late-night tweaks.

Practical Checklist for Ops Teams

  • Guardrails and policy: Define what the agent can change, when, and with which approvals. Enforce change windows and freeze periods.
  • Human-in-the-loop: Set thresholds for auto-approve vs. operator review. Keep a one-click rollback path.
  • Observability: Log every recommendation and action with context, reasoning, and outcome. Make audits easy.
  • Shadow mode first: Run in recommend-only and compare against operator actions before enabling autonomous execution.
  • KPIs: Baseline TTD/TTR, call drop rate, throughput, and customer impact pre/post deployment.
  • Data governance: Clarify use of public event data, retention policies, and compliance requirements.
  • Incident playbooks: Integrate with existing on-call, paging, and escalation flows.
  • Integration: Confirm safe interfaces with OSS/BSS, RAN controllers, and configuration management.

Risks and Questions to Ask

  • Accuracy: How are false positives handled when crowd estimates or anomaly detections are off?
  • Model reliability: What safeguards catch LLM reasoning errors? How do agents cross-validate results?
  • Change safety: Are there blast-radius limits, circuit breakers, and automatic rollbacks?
  • Drift and updates: How often are models retrained and revalidated? What's the approval process?
  • Vendor dependence: What's the fallback if a cloud service is degraded? Is there portability across stacks?
  • Privacy and compliance: How is public data processed, stored, and audited?

Tech Stack and Context

The platform was showcased at MWC earlier this year and built using Google's Gemini 2.5, Cloud Run, BigQuery, and Firestore. For reference on the foundation model stack, see Google Cloud Gemini.

What to Do Next

  • Define a contained pilot (event-focused cells) and clear SLOs.
  • Run a side-by-side trial: agent recommendations vs. human actions.
  • Adopt staged autonomy: recommend-only → supervised execute → time-bound full execute.
  • Train your NOC/RAN teams on review workflows and rollback drills.
  • Hold a postmortem after the first major event to refine thresholds and policies.

The concept looks strong on paper. The real test is sustained performance in live traffic, starting with events and expanding to broader scenarios.

If you're planning skills development around AI-driven operations and automation, explore practical learning paths here: Automation-focused resources.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)