MWC 2026: Google Cloud pushes telcos toward Level 4-5 automation with AI
At Mobile World Congress in Barcelona, Google Cloud laid out a plan to move operators closer to Level 4 and Level 5 automation. The focus: make the data platform as dynamic as the network, then use AI to shorten outages, compress change windows and keep SLAs intact.
This isn't just talk. Google Cloud is working with Mas Orange, Vodafone, Deutsche Telekom (DT), DigitalRoute, One New Zealand and Nokia's network-as-code platform on live automation projects. Meanwhile, AWS and Nokia said Orange and du are piloting agentic AI for 5G slicing - trials today with a path to production.
Why this matters for operations
- Faster root cause analysis and fewer false escalations.
- Closed-loop changes with guardrails instead of manual midnight pushes.
- Predictive maintenance that cuts MTTR and protects premium slice SLAs.
- Unified data flows that reduce swivel-chair ops between tools and teams.
What Google Cloud actually put on the table
- Network digital twin (dynamic, temporal graph): Moves from a static map to a live graph of physical and logical state. Captures real-time performance and fault conditions, while supporting "time travel" queries so engineers can inspect the network as it looked hours or days ago for instant, accurate RCA.
- Unified graph data layer: Breaks silos between operational and analytical data. Uses Cloud Spanner Graph for the twin, with federated graph analytics through BigQuery to run complex, cross-domain queries without heavy data movement. BigQuery overview
- Realtime predictions with GNNs: Train Graph Neural Networks on twin data in Vertex AI, then serve predictions using Spanner's ML.PREDICT plus live twin feeds. This shifts ops from monitoring to predicting - modeling failure propagation and acting before subscribers feel pain. Vertex AI
Proof that autonomy is landing
For years, "self-driving networks" sounded like wishful thinking. Now there are production examples; DT is already running pieces of this without a human in the loop for every change. The pattern is clear: AI agents assist humans first, then take on well-bounded actions with policy guardrails.
How to turn this into wins in your NOC
- Stabilize your data foundation: Centralize telemetry (events, metrics, traces, configs), standardize IDs for assets and links, enforce time sync and retention by domain. Map what lands in the twin vs. cold storage.
- Prioritize high-impact use cases: Start with alarm correlation, RAN congestion mitigation, fiber break prediction and 5G slice assurance. Define target KPIs and a stop/go threshold before scaling.
- Build the twin incrementally: Load inventory, auto-discover topology, then layer real-time state. Add temporal snapshots so on-call can query "what changed" across a window, not just a device.
- Stand up MLOps for GNNs: Label historical incidents, automate feature pipelines from the twin, set drift monitors and rollback paths. Treat models like code with versioning and staged rollouts.
- Automate safely: Use policy-as-code, approval workflows and canary actions. Keep humans in the loop until action accuracy and blast-radius controls meet your thresholds.
- Integrate with your stack: Pipe predictions and recommended actions into ITSM, paging and runbooks. Close the loop by auto-attaching twin views to tickets.
- Track ROI ruthlessly: MTTR, time-to-first-response, proactive tickets created, slice SLA violations avoided, change success rate and OPEX per site/GB/user.
What to ask vendors this week
- What's your topology coverage across RAN, transport, core and cloud? How do you validate and reconcile conflicts?
- How fast can I query "state as of T-3h" across domains? Show latency and cost at my scale.
- How do models consume twin data in real time? What's the rollback if predictions drift?
- What guardrails prevent unsafe changes? Show policy, simulation and blast-radius controls.
- How do you handle data residency, lineage and RBAC? Prove least privilege across teams.
- What's the cost model for ingest, storage, queries and inference under peak events?
- How cleanly do you integrate with ITSM, observability and network controllers I already run?
Risks and reality checks
- Data quality debt: Noisy telemetry and stale inventory wreck graph accuracy. Budget time to clean it.
- Topology drift: Cloud-native updates move faster than change boards. Automate discovery and diff alerts.
- Model drift and alert fatigue: Retrain on fresh incidents and gate actions behind confidence + policy.
- Cost sprawl: Model serving plus high-frequency queries can spike bills. Set sampling and TTLs early.
- Lock-in concerns: Favor open schemas for network assets and export paths for features/labels.
90-day execution plan
- Weeks 1-2: Pick two use cases and define KPIs. Inventory data sources and owners. Close PII/compliance gaps.
- Weeks 3-6: Stand up a minimal twin with Cloud Spanner Graph + streaming ingest. Wire time-travel queries. Ship twin snapshots into BigQuery for analytics.
- Weeks 7-9: Train a baseline GNN in Vertex AI using labeled incidents. Integrate predictions into ITSM as "suggested actions."
- Weeks 10-12: Pilot automated, low-risk actions with guardrails. Review KPI deltas, cost, and operator feedback. Decide scale-up or iterate.
Helpful resources
Bottom line: the pieces for autonomous operations are landing - a live network graph, a unified data layer and predictive models that act before customers call. Start small, wire guardrails and prove the KPIs. Then scale with confidence.
Your membership also unlocks: