SoftBank's Infrinia AI Cloud OS: Automating GPU Ops from BIOS to Inference
Credit: Gorodenkoff / Shutterstock
SoftBank has launched Infrinia AI Cloud OS, a software stack that automates infrastructure operations and inference services on GPU platforms, including Nvidia's GB200 NVL72. The goal is straightforward: reduce operational burden, simplify lifecycle management, and cut TCO for GPU cloud environments.
Infrinia delivers two core services: Kubernetes as a Service (KaaS) and Inference as a Service (Inf-aaS). It handles everything from BIOS and RAID settings to Kubernetes management, GPU drivers, networking, and storage-out of the box.
What Ops Teams Get
Kubernetes as a Service automates the stack so you don't spend cycles on low-level configuration or cluster plumbing.
- End-to-end automation: BIOS, RAID, OS, GPU drivers, networking, Kubernetes controllers, and storage.
- Dynamic reconfiguration of Nvidia NVLink connectivity and memory allocation as clusters are created, updated, or deleted.
- Node allocation based on GPU proximity and NVLink domain setup to reduce latency.
Inference as a Service abstracts deployment so teams can offer LLM inference without touching Kubernetes.
- Select large language models and deploy via OpenAI-compatible APIs-no cluster tuning required.
- Scales across multiple nodes, including GB200 NVL72 systems.
- Tenant isolation with encrypted communications, automated monitoring and failover, and APIs for portal, customer management, and billing integration.
Why It Matters for Operations
Enterprises wrestle with GPU cluster provisioning, Kubernetes lifecycle management, and inference scaling. According to Charlie Dai, VP and principal analyst at Forrester, SoftBank's automated approach tackles these challenges by handling BIOS-to-Kubernetes configuration, optimizing GPU interconnects, and abstracting inference into API-based services. That lets teams focus on model delivery and SLOs instead of daily firefighting.
SoftBank says the stack aims to lower TCO and operational overhead versus bespoke builds or in-house tools. For Ops leaders, that means fewer custom runbooks to maintain and faster time to service readiness.
Competitive Context
The GPU cloud software market is projected to grow from $8.21 billion in 2025 to $26.62 billion by 2030. SoftBank now competes with hyperscale providers and specialized GPU platforms.
- AWS EKS, Microsoft Azure AKS, and Google Cloud GKE offer managed Kubernetes with GPU support.
- CoreWeave runs roughly 45,000 GPUs and is Nvidia's first Elite-level cloud services provider.
- Lambda Labs reported $425 million in 2024 revenue and lists H100 instances at $2.49/hour.
Dai's take: advantage is shifting from who has GPUs to who can automate orchestration, abstract inference, and streamline the AI lifecycle end to end.
Rollout and Availability
SoftBank plans to deploy Infrinia in its own GPU cloud first, then offer it to external customers and overseas data centers. "The advancement of AI infrastructure requires ... software that integrates these resources and enables them to be delivered flexibly and at scale," said Junichi Miyakawa, SoftBank's president and CEO. Pricing and availability weren't disclosed.
Practical Next Steps for Ops Leaders
- Identify use cases: internal KaaS for multi-tenant teams, external Inf-aaS for product teams, or both.
- Map integration points: portal, customer management, billing, observability, and access control policies.
- Benchmark costs vs. EKS/AKS/GKE or specialized providers (CoreWeave, Lambda Labs, RunPod); include interconnect and data egress in the model.
- Plan GPU topology management: NVLink domains, GPU proximity, and node placement strategies to hit latency targets.
- Run a pilot: validate cluster automation, failover behavior, and scaling under LLM inference load.
Helpful References
Level Up Your Team
If you're building an internal capability around AI infrastructure and API-based inference, consider structured upskilling. Curated options by role can help standardize skills across Ops, Platform, and MLOps teams.
Your membership also unlocks: