OpenAI’s gpt‑oss Models Bring Open-Weight AI to Cloud and Edge with Azure and Windows Integration

OpenAI’s gpt‑oss models offer open-weight AI that runs on single GPUs or locally, enabling flexible, customizable deployments. Microsoft’s Azure AI Foundry supports fine-tuning and seamless integration across cloud and edge.

Published on: Aug 06, 2025
OpenAI’s gpt‑oss Models Bring Open-Weight AI to Cloud and Edge with Azure and Windows Integration

OpenAI’s gpt‑oss Models: A New Era for AI Development

OpenAI’s launch of the gpt‑oss models marks its first open-weight release since GPT‑2, offering developers and enterprises greater control to run, adapt, and deploy models on their own terms. You can now run gpt‑oss‑120b on a single enterprise GPU or run gpt‑oss‑20b locally. AI has shifted from being just a layer in the technology stack to becoming the stack itself.

This shift demands tools that are open, adaptable, and ready to operate wherever your projects take place—whether that’s in the cloud, at the edge, during early experimentation, or at scale.

Full-Stack AI Development with Microsoft

Microsoft is developing a full-stack AI platform that empowers developers to build and create with AI, not just use it. This vision spans cloud to edge, integrating Azure AI Foundry, Foundry Local, and Windows AI Foundry.

  • Azure AI Foundry offers a unified platform for building, fine-tuning, and deploying intelligent AI agents.
  • Foundry Local brings open-source models to the edge, enabling on-device inferencing across billions of devices.
  • Windows AI Foundry integrates Foundry Local into Windows 11, supporting secure, low-latency local AI development closely aligned with the Windows platform.

Create Intelligent Applications with Azure AI Foundry

Open models have moved beyond niche use cases and now power everything from autonomous agents to domain-specific copilots. Azure AI Foundry supports this momentum by providing infrastructure that allows teams to fine-tune models quickly using parameter-efficient methods such as LoRA and QLoRA.

With full access to model weights, you can customize models by splicing in proprietary data, applying quantization or structured sparsity, adjusting context lengths, or exporting for containerized inference on Kubernetes. This flexibility means you can create AI solutions optimized for performance, memory constraints, and security audits.

Azure AI Foundry delivers training pipelines, weight management, and low-latency serving, helping developers push the boundaries of AI customization with ease.

Meet gpt‑oss: Two Models, Infinite Possibilities

The gpt‑oss models available today on Azure AI Foundry are:

  • gpt‑oss-120b – A 120 billion parameter model with architectural sparsity. It offers powerful reasoning capabilities at a fraction of the size, excelling at complex tasks like math, coding, and domain-specific Q&A. It can run on a single datacenter-class GPU, making it ideal for secure, high-performance deployments where cost and latency matter.
  • gpt‑oss-20b – A lightweight, tool-savvy model optimized for agentic tasks like code execution and tool use. It runs efficiently on a range of Windows hardware, including GPUs with 16GB+ VRAM, and will soon support more devices. This model suits autonomous assistants and AI embedded in real-world workflows, even in bandwidth-constrained settings.

Both models will soon support the common responses API, allowing easy integration into existing applications with minimal changes.

Deploying gpt‑oss on Cloud and Edge

Azure AI Foundry is more than a model catalog; it’s a platform for AI builders offering over 11,000 models and growing. With gpt‑oss available, you can:

  • Launch inference endpoints in the cloud with simple CLI commands.
  • Fine-tune and distill models using your own data, then deploy confidently.
  • Combine open and proprietary models to meet specific task requirements.

For on-device scenarios, Foundry Local brings open-source models to Windows AI Foundry, optimized for CPUs, GPUs, and NPUs. This lets you deploy gpt‑oss-20b on modern Windows PCs, keeping your data local while benefiting from advanced AI capabilities.

This hybrid AI approach enables mixing models, optimizing performance and cost, and maintaining control over data location.

Empowering Builders and Decision Makers

Open weights with gpt‑oss provide transparency and flexibility for developers. You can inspect, customize, fine-tune, and deploy models aligned with your needs. For decision makers, these models offer competitive performance without black-box concerns, supporting compliance and cost-efficiency.

A Future of Open and Responsible AI

The release of gpt‑oss within Azure and Windows reflects a commitment to making AI accessible and flexible. Microsoft supports a diverse model portfolio—both proprietary and open—backed by built-in safety and governance tools to maintain trust and compliance across deployments.

Additionally, Microsoft continues to support open tools and standards, exemplified by making the GitHub Copilot Chat extension open source under the MIT license.

This integration of research, product, and platform means that advanced AI capabilities are now available as open tools for everyone, with Azure acting as the bridge to bring them into practical use.

Next Steps and Resources

Get started by deploying gpt‑oss in the cloud today using Azure AI Foundry with just a few CLI commands. Explore the Azure AI Model Catalog to spin up endpoints, or deploy gpt‑oss-20b on your Windows device via Foundry Local. MacOS support is coming soon.

Follow the QuickStart guides to learn more and put these models to work.

Pricing

Pricing details are available on the Managed Compute pricing page and are accurate as of August 2025.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)