OCI Zettascale10 connects 800,000 Nvidia GPUs for 16 zettaFLOPS and massive-context AI

Oracle's OCI Zettascale10 links up to 800,000 Nvidia GPUs for 16 zettaFLOPS and will anchor the $500B Stargate project. Orders are open; first availability targets H2 2026.

Oracle's OCI Zettascale10: 800,000 Nvidia GPUs linked for massive-context AI

Oracle Cloud Infrastructure announced what it calls the largest AI supercomputer in the cloud: OCI Zettascale10. The multi-gigawatt design links hundreds of thousands of Nvidia GPUs and targets "unprecedented" performance for training and inference. It is set to anchor the $500 billion Stargate project and serve large-scale, industry-specific AI workloads.

"The platform offers benefits such as accelerated performance, enterprise scalability, and operational efficiency attuned to industry-specific AI applications," said Yaz Palanichamy, senior advisory analyst at Info-Tech Research Group.

What Zettascale10 brings

The system stitches together GPU clusters across multiple data centers to deliver a claimed 16 zettaFLOPS of peak performance. One zettaFLOP is 10^21 floating point operations per second-orders of magnitude beyond exaFLOP (10^18) and gigaflop (10^9) systems.

It targets large generative AI use cases, including training and serving large language models. Oracle is taking orders now and expects availability in the second half of 2026, initially supporting deployments with up to 800,000 Nvidia GPUs.

How it works: fabric, latency, and throughput

Oracle introduced new capabilities in its Acceleron networking stack: dedicated network fabrics, converged NICs, and host-level zero-trust packet routing. Oracle says these double network and storage throughput while lowering latency and cost.

The architecture is built on Acceleron RoCE (RDMA over Converged Ethernet) and Nvidia AI infrastructure, aiming for very low GPU-to-GPU latency and better price/performance. The fabric is "wide, shallow, resilient," using switching built into modern GPU NICs to connect to multiple switches simultaneously, each on isolated planes. Traffic can be shifted across planes to avoid unstable paths, helping reduce stalls and checkpoint restarts.

Clusters sit within large campuses inside a two-kilometer radius and use energy-efficient optics to maximize density. "The highly scalable custom design maximizes fabric-wide performance at gigawatt scale while keeping most of the energy focused on compute," said Peter Hoeschele, VP for infrastructure and industrial compute at OpenAI.

Why this scale matters now

"There are customers for it," said Alvin Nguyen, senior analyst at Forrester, pointing to organizations pushing the limits like OpenAI. Training has moved beyond text into images, audio, and video-data that is heavier and far more compute-intensive. "Inferencing is expected to grow even bigger than the training steps," he noted.

Nguyen cautioned that scale takes time to build, which can create supply risks. "There is a concern in terms of what it means if enterprises don't have enough supply. However, a lot of it is unpredictable." Palanichamy pointed to the Oracle-AMD partnership as a step to balance extreme GPU demand and improve energy efficiency for large-scale training and inference.

Practical takeaways for CIOs, IT, and product teams

Right-size first, then scale: Start with smaller clusters and plan an upgrade path. Many use cases don't need the latest GPUs on day one.
Tune the stack: Update CUDA/toolchains, kernels, compilers, and runtimes. Apply quantization, sparsity, and mixed precision to cut cost without hurting quality.
Design for inference growth: Build distributed serving with caching, token streaming, and autoscaling. Expect inference to dominate budgets.
Diversify supply: Explore multiple accelerators (including AMD), reserve capacity early, and keep multi-cloud options open.
Engineer for the fabric: Understand RoCE behavior, traffic isolation, and checkpoint strategies to minimize stalls under load.
Strengthen FinOps and governance: Set clear SLOs, track train vs. serve spend, and enforce cost guardrails per product line.
Upskill teams: Close gaps in distributed training, model serving, and GPU networking. Consider structured learning paths for faster execution.

Quick context: RoCE and zettaFLOPS

RoCE enables direct memory access across the network, cutting CPU overhead and latency for GPU-to-GPU data flows. See Nvidia's overview for details: Introduction to RoCE.
ZettaFLOPS signal sheer scale: 16 zettaFLOPS means the system can run on the order of 10^22 operations per second at peak theoretical throughput.

If you don't need an AI mega-factory

Nguyen's advice: get creative. Most enterprises can improve performance by modernizing software, optimizing models, and improving pipeline efficiency before chasing the biggest clusters.

Leverage partner capacity rather than building your own factory.
Audit your supply chain and vendor options so you can move quickly when capacity appears.
Focus on execution. You don't have to be first-you need to be ready.

Need to upskill teams for large-scale training and efficient inference? Explore curated learning paths at Complete AI Training.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement