OpenAI and Broadcom unveil Jalapeño LLM inference chip

OpenAI and Broadcom built the Jalapeño AI inference chip in just nine months. It targets gigawatt-scale data center deployments starting in 2026.

Categorized in: AI News IT and Development

Published on: Jun 25, 2026

OpenAI and Broadcom (NASDAQ: AVGO) have unveiled Jalapeño, a custom AI accelerator built specifically for large language model inference. The chip is the first in a multi-generation compute platform and was co-developed from design to tape-out in nine months. Early lab testing, including workloads like GPT-5.3-Codex-Spark, shows the accelerator will deliver performance per watt substantially better than current state-of-the-art hardware. The companies plan to deploy the platform at gigawatt scale with data center partners beginning in 2026.

Architecture optimized for LLM inference

Jalapeño is a ground-up design for modern LLM inference, not a general-purpose accelerator adapted from earlier AI workloads. OpenAI designed the chip architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier models. The result, according to early testing, is realized utilization closer to the hardware's theoretical peak performance. Broadcom provided silicon implementation and networking technologies, including Tomahawk networking silicon, while Celestica handled board, rack, and system integration.

"Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers," said Richard Ho, who leads OpenAI's hardware program. "We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models. Based on early testing, Jalapeño will efficiently execute our most important workloads close to the hardware's theoretical limits."

Nine-month tape-out accelerated by AI

The chip moved from initial design to manufacturing tape-out in just nine months, which the companies say is the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors. OpenAI's own models were used to accelerate parts of the design and optimization process. That feedback loop-where the same models served to users help design the next generation of inference hardware-is central to the full-stack strategy OpenAI describes.

"The world is moving to a compute-powered economy," said Greg Brockman, President and Co-Founder of OpenAI. "Jalapeño is part of our long-term full-stack infrastructure strategy to make compute more abundant, resulting in AI which is faster, more reliable, more affordable for people and businesses, and can be used to solve more important problems. By designing more of the stack ourselves, we can serve more intelligence with greater efficiency and keep pushing advanced AI toward broader access."

First step in a multi-generation platform

Jalapeño is the first accelerator in a roadmap that combines OpenAI-designed silicon with Broadcom's networking and connectivity technologies and Celestica's manufacturing expertise. The platform targets initial deployment by the end of 2026 and will scale over multiple generations. Broadcom President and CEO Hock Tan described the partnership as "a fundamental commitment to scaling the physical infrastructure required for the next decade of AI" and noted that co-developing silicon directly with OpenAI allows gigawatt-scale data center deployments with Microsoft and other partners.

Why this matters for IT and Development

Inference is the moment AI reaches users-every ChatGPT response, every Codex task, every API call. Jalapeño's design focuses on combining high throughput with low latency, making it suited for interactive LLM products at scale. For developers building on OpenAI's APIs, a more efficient inference platform could translate into lower per-request costs and faster response times. For IT teams planning infrastructure, the move toward custom accelerators optimized for LLM workloads signals a shift in how data center hardware will be procured and managed. If OpenAI's claim of substantially better performance per watt holds, organizations running large-scale AI inference could see meaningful reductions in power and cooling demands. The nine-month tape-out also hints at a faster hardware iteration cycle that could accelerate the availability of more efficient compute across the industry.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

OpenAI and Broadcom unveil Jalapeño LLM inference chip

Architecture optimized for LLM inference

Nine-month tape-out accelerated by AI

First step in a multi-generation platform

Why this matters for IT and Development

Related AI News for IT and Development

Moonshot launches world's largest open-weight AI model

Noetra begins developing multimodal AI foundation model for physical AI and robotics in Japan

SpaceXAI releases Grok 4.5 for coding and knowledge work

QCon AI Boston 2026 highlights the shift from prompt engineering to production infrastructure for AI agents

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: