Why AI and automated operations need horizontal telco clouds
AI-driven operations run on continuous data. That's the core reason carriers are moving from vertical stacks to horizontal telco clouds. You still get fully 3GPP-compliant networks, but the way you build and operate them changes completely.
In a horizontal model, the platform becomes the leverage point. Improve the platform once, and every application benefits. That's how you get resilience, speed, and predictable outcomes at scale.
From vertical stacks to horizontal platforms
Operators are breaking silos into layers so the network behaves like a modern platform. As Franz Seiser of Deutsche Telekom puts it, the output stays the same, but the production method is different-by design.
- Infrastructure: Pooled compute, storage, and network across sites with consistent security and observability.
- Application: Cloud-native network functions with standard APIs and automation-ready configs.
- Automation: Built-in from day zero. GitOps, policy, intent, and closed-loop controls baked into the platform.
Real-time data is the fuel for automation
Legacy silos often produce a single metric every five minutes. That doesn't feed AI. Modern automation needs a live stream from every component-metrics, events, logs, and traces-without gaps.
Horizontal telco clouds create that continuous pipeline. One shared data plane means improvements roll out once and lift everything, instead of patching each silo separately.
- Streaming telemetry and event buses for sub-second signals.
- Common data models and strict time sync for correlation.
- Platform-level optimizations that instantly benefit all domains.
Vendors now deliver components, not silos
This shift changes supplier relationships. Instead of buying "the whole box," operators contract for well-defined pieces of the stack. Nokia, for example, has been adapting its portfolio to be fully cloud-native to operate within shared platforms. Jean Lawrence describes the approach as an industry blueprint for a more IT-oriented, faster-to-change network.
- Contract on APIs, schemas, and SLOs-not monolithic deliverables.
- Enforce conformance via CI gates, automated integration tests, and reproducible environments.
- Security by default: signed artifacts, SBOMs, and least-privilege runtime policies.
Operating model: what Operations leaders must change
Dismantling silos is as much organizational as it is technical. Teams move from owning "their stack" to owning platform capabilities and shared services. As Seiser notes, "They optimised, of course, their silo. But local optimums are not the global optimum."
The guidance is blunt: enforce the architecture and don't make exceptions. "Don't compromise because as soon as you start with the first compromise … the fifth is coming, and then you are in a not-so-easy to manage environment."
- Build cross-functional platform teams for infrastructure, platform services, security, data, FinOps, and SRE.
- Publish golden paths (paved roads) with opinionated tooling, templates, and automation.
- Adopt SRE: SLOs and error budgets, blameless postmortems, and proactive capacity management.
- Shift to product funding for platforms with clear service taxonomies and chargeback/showback.
- Codify runbooks, approvals, and guardrails as code; minimize manual tickets.
For hands-on guidance on building AI-enabled workflows and platforms in Ops, see AI for Operations.
Data and reliability KPIs that signal progress
- Data freshness: p50/p95 telemetry latency from source to consumer.
- Coverage: percentage of functions emitting structured metrics, logs, and traces.
- Change failure rate, MTTD, and MTTR for network changes.
- Deployment frequency and lead time for CNFs/VNFs.
- Automated-to-manual change ratio and rollback rate.
- Platform reuse: percentage of services on shared pipelines and golden paths.
Practical rollout roadmap
- Days 0-90: Inventory current stacks and data flows. Define the reference architecture and API contracts. Stand up a minimal platform (container orchestration, service mesh, observability, GitOps). Pilot one domain (e.g., core).
- Days 90-180: Onboard initial CNFs/VNFs. Establish a streaming data platform. Enforce signing, SBOMs, and policy-as-code. Introduce blue/green or canary for one customer-facing service.
- Days 180-365: Expand to RAN and transport. Industrialize CI/CD, testing, and drift detection. Decommission legacy monitoring where coverage overlaps. Update vendor contracts to component-level SLOs.
Use proven open-source building blocks from the CNCF ecosystem to avoid tool sprawl and lock-in.
If your engineering teams need to deepen cloud-network skills, explore the AI Learning Path for Network Engineers.
Common risks and how to offset them
- Data silos persist: Mandate a shared schema, centralized catalog, and data contracts; block deployments that bypass them.
- Vendor lock-in: Prioritize open interfaces, multi-vendor test beds, and exit criteria in contracts.
- Tool sprawl: Curate a small, supported platform stack; retire overlapping tools on a schedule.
- Security regression: Shift left with signed builds, image scanning, runtime policy, and continuous posture management.
The bottom line for Operations
A horizontal telco cloud is not a nice-to-have-it's how you make AI and automation reliable at carrier scale. Commit to the architecture, stream real-time data from everything, and bake automation into the platform. Do this without exceptions, and you get stability today with headroom for tomorrow's services.
Your membership also unlocks: