DOCOMO Demonstrates AI on vRAN CPUs-No Dedicated Accelerators Required

DOCOMO proved AI can run on vRAN CPUs alongside radio, on hardware you already own. Ops teams get a path to curb GPU spend, place workloads smartly, and keep performance steady.

Categorized in: AI News Operations

Published on: Feb 26, 2026

DOCOMO runs AI on vRAN CPUs: what operations teams need to know

Tokyo, Feb 25, 2026 - DOCOMO has validated AI applications running directly on the general-purpose CPU resources of its commercial vRAN. The takeaway is simple: you can execute meaningful AI workloads alongside live radio processing using the compute you already have.

Why this matters for operations: AI traffic is surging, and GPU supply, power, and cost are hard constraints. This approach shows a practical path to absorb growth, control spend, and keep performance steady by placing the right workload on the right silicon.

What they actually built

vRAN baseband software from NEC, running on general-purpose servers.
Virtualization platform from AWS hosting both vRAN and AI applications.
Targeted accelerator cards from Qualcomm for specific compute tasks.
Servers from HPE integrating the above stack.

Key result: AI applications executed on CPUs in parallel with live network processing. No dedicated high-end accelerators required for a certain class of AI tasks.

Why this is operationally significant

Cost control: Use idle CPU cycles across sites before buying more GPUs. Lower capex, better ROI on existing hardware.
Placement flexibility: Push latency-sensitive, lightweight inference close to radios; reserve GPUs in regional sites for heavy models.
Energy efficiency: CPUs across wide areas can deliver acceptable AI throughput at lower power for many use cases.
Resilience: Distribute AI across cells or sites to avoid single points of failure and smooth demand spikes.

Which AI workloads fit on CPUs in vRAN

Telemetry analytics: anomaly detection, RAN KPI drift, predictive alarms.
Lightweight inference: small LLM classifiers, keyword spotting, simple vision checks for site monitoring.
Policy/optimization loops: traffic steering hints, energy-saving modes, admission control assistance.

Reserve GPUs for large models, high-throughput vision, or complex multi-modal inference. Think "CPU for breadth, GPU for depth."

Operational patterns that work

Resource isolation: Pin vRAN threads and IRQs; allocate AI containers to separate cores/NUMA nodes. Protect real-time paths first.
Admission control: Gate AI jobs on RAN KPIs (e.g., PRB utilization, HARQ BLER) and CPU headroom.
Latency budgets: Set hard ceilings for AI end-to-end latency to avoid starving vRAN processing.
Observability: Correlate AI workload metrics with RAN KPIs in the same pane of glass. Alert on deviation from baseline.
Progressive rollout: Start with off-peak windows and low-traffic cells, then expand.

How to pilot this in your network

Inventory CPU headroom per site and per time-of-day. Identify consistent slack.
Classify candidate AI tasks by latency, throughput, and model size. Map "CPU-friendly" first.
Choose orchestration: K8s with explicit CPU pinning and QoS, or VM-based isolation aligned to vRAN threads.
Set SLOs for both sides: vRAN (throughput, jitter, packet loss, call drop) and AI (p95 latency, success rate, TPS).
Run A/B baselines. Enable AI, measure deltas, auto-rollback on threshold breach.

Metrics to track

vRAN: PRB utilization, HARQ BLER, scheduling latency, packet loss, jitter, interrupt latency.
Compute: per-core CPU utilization, cache misses, context switches, NUMA remote memory access.
AI: p50/p95 inference latency, throughput, model accuracy drift.
Efficiency: watts per inference, site PUE proxy, TCO per covered Mbps.

Constraints and risks

Contention: poor isolation can degrade radio performance. Treat AI as a best-effort tenant.
Real-time tuning: DPDK, kernel settings, and BIOS power states must be aligned to protect vRAN timing.
Model lifecycle: monitor drift; schedule updates during low-load windows.
Vendor mix: integration across RAN, cloud stack, accelerators, and servers adds change-management overhead.
Data handling: keep sensitive network data compliant when AI runs at the edge.

What's next

DOCOMO plans to optimize CPU/GPU placement per workload and traffic profile, and will showcase the initiative at Mobile World Congress Barcelona 2026. Expect more playbooks on where to run which models and how to schedule them safely.

If you want to go deeper

AI Learning Path for Network Engineers

Bottom line for operations: treat your RAN sites as distributed compute. Put the right AI work on the CPUs you already manage, enforce guardrails, and scale what proves stable and cost-effective.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

DOCOMO Demonstrates AI on vRAN CPUs-No Dedicated Accelerators Required

DOCOMO runs AI on vRAN CPUs: what operations teams need to know

What they actually built

Why this is operationally significant

Which AI workloads fit on CPUs in vRAN

Operational patterns that work

How to pilot this in your network

Metrics to track

Constraints and risks

What's next

If you want to go deeper

Related AI News for people in Operations

DOCOMO Demonstrates AI on vRAN CPUs-No Dedicated Accelerators Required

Microsoft Expands Sovereign Private Cloud with Secure AI for Disconnected Environments

Beijing Hosts UN Program Uniting 17 Countries to Turn Urban AI into Real Productivity

BigBear.ai Partners With Maqta Technologies on Port AI as BBAI Jumps 5% Ahead of March 2 Earnings

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: