Amazon and OpenAI ink $38B, seven-year cloud deal to train and run ChatGPT on AWS

AWS and OpenAI struck a seven-year, $38B deal for massive GPU capacity and lower latency clusters. Teams should book capacity, benchmark GB200/GB300, and keep serving portable.

Categorized in: AI News IT and Development
Published on: Nov 09, 2025
Amazon and OpenAI ink $38B, seven-year cloud deal to train and run ChatGPT on AWS

AWS and OpenAI sign US$38B, seven-year deal: what IT and engineering teams should plan for now

AWS and OpenAI have inked a seven-year, US$38 billion agreement that gives OpenAI immediate access to large-scale AWS infrastructure. The companies say OpenAI can run training and inference for models like ChatGPT on "hundreds of thousands" of Nvidia GPUs today, with full capacity coming online through 2026 and optional expansion into 2027 and beyond.

Scope: compute, chips and scale

  • Training and serving on AWS at large scale, beginning immediately.
  • Clusters use Nvidia GB200 and GB300 GPUs, networked via Amazon EC2 UltraServers to reduce latency across interconnected systems.
  • Contract designed to scale to tens of millions of CPUs plus very large GPU fleets.
  • Rollouts staged through 2026 with room to expand thereafter.

This follows OpenAI's move away from single-cloud dependence. Analysts point to a broader trend: model providers locking in long-term capacity across multiple hyperscalers to secure predictable supply, price, and performance. On AWS's side, this lands as it activates very large AI clusters, including "Project Rainier," reported at roughly 500,000 Trainium2 chips for high-throughput training.

Why this matters for engineers and architects

More shared, high-density GPU clusters typically mean better queue times and lower unit costs over time for both training and inference. The EC2 UltraServers fabric targets lower latency between nodes, which you'll feel in multi-node training and large-scale inference fan-out.

For production apps-search, personalization, forecasting, support-the big win is predictable performance at scale. For R&D, shorter experiment cycles and higher token throughput mean faster iteration and cleaner A/B data.

Architecture notes

  • Latency paths: UltraServers aim to reduce interconnect hops. Expect gains in pipeline-parallel and tensor-parallel setups and in retrieval-heavy inference paths.
  • Throughput: GB200/GB300 clusters plus large CPU backplanes help with pre/post-processing, tokenization, and sharding overhead.
  • Reliability: Multi-AZ cluster design and managed placement groups should smooth failover in long-running training runs.
  • Scheduling: Staged capacity ramps through 2026-plan reservation windows for critical retrains ahead of peak seasons.

Cost and procurement signals

  • Unit costs: Consolidated training and inference on shared mega-clusters can reduce $/token and $/training step as utilization rises.
  • Commit leverage: Long-term spend commitments across providers are becoming standard. Expect better pricing with workload profiles (training vs. inference) clearly split.
  • FinOps: Track tokens, context lengths, and concurrency as first-class cost drivers. Tie experimentation budgets to model evaluation gates to prevent silent overrun.

Multi-cloud implications

OpenAI's diversified hosting approach signals a fresh normal: capacity spread across hyperscalers to hedge supply risk and align workloads with specific hardware. For your roadmap, assume portability. Containerize preprocessing, feature stores, and retrieval layers, and keep model-serving interfaces abstracted behind a stable contract.

If you already run cross-cloud, standardize observability (tracing, token metrics, and queue depth) so you can rebalance traffic quickly when pricing or availability shifts.

Impact on retailers and global enterprises

Large, low-latency GPU clusters open the door to faster semantic search, more granular personalization, tighter demand forecasting, and responsive support flows. The practical shift: move from pilot LLM features to regionally distributed, production-grade services without blowing up latency budgets.

Expect more vendors to offer "bring-your-own-data" agentic workflows on Bedrock and SageMaker with OpenAI models as a native option. Data governance stays in your account while you get access to top-tier inference capacity.

What to do next

  • Benchmark: Run A/B tests across GB200/GB300-backed endpoints vs. your current stack; capture latency p50/p95, tokens/sec, and error rates.
  • Right-size contexts: Audit prompt and RAG context lengths; trim tokens where quality holds. Savings compound at scale.
  • Plan reservations: For Q1/Q3 retrains, secure capacity early. Treat training windows like seasonal inventory.
  • Abstract serving: Use a routing layer that can swap providers by policy (cost, latency, geo) without app changes.
  • Governance: Keep PII in-account. Enforce prompt/content filtering and retrieval guards before model calls.

Previous steps that set this up

In August 2025, AWS made two OpenAI open-weight models available in Bedrock and SageMaker. Early adopters in media, fitness, and healthcare tested agentic workflows, coding support, and scientific analysis. The new compute deal is positioned to scale those experiments into steady-state production.

Helpful links

Upskill your team

If you're aligning roles to new AI workloads, see our curated tracks for engineers and data teams.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide