OpenAI and Broadcom (NASDAQ: AVGO) have unveiled Jalapeño, a custom AI accelerator built specifically for large language model inference. The chip is the first in a multi-generation compute platform and was co-developed from design to tape-out in nine months. Early lab testing, including workloads like GPT-5.3-Codex-Spark, shows the accelerator will deliver performance per watt substantially better than current state-of-the-art hardware. The companies plan to deploy the platform at gigawatt scale with data center partners beginning in 2026.
Architecture optimized for LLM inference
Jalapeño is a ground-up design for modern LLM inference, not a general-purpose accelerator adapted from earlier AI workloads. OpenAI designed the chip architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier models. The result, according to early testing, is realized utilization closer to the hardware's theoretical peak performance. Broadcom provided silicon implementation and networking technologies, including Tomahawk networking silicon, while Celestica handled board, rack, and system integration.
"Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers," said Richard Ho, who leads OpenAI's hardware program. "We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models. Based on early testing, Jalapeño will efficiently execute our most important workloads close to the hardware's theoretical limits."
Nine-month tape-out accelerated by AI
The chip moved from initial design to manufacturing tape-out in just nine months, which the companies say is the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors. OpenAI's own models were used to accelerate parts of the design and optimization process. That feedback loop-where the same models served to users help design the next generation of inference hardware-is central to the full-stack strategy OpenAI describes.
"The world is moving to a compute-powered economy," said Greg Brockman, President and Co-Founder of OpenAI. "Jalapeño is part of our long-term full-stack infrastructure strategy to make compute more abundant, resulting in AI which is faster, more reliable, more affordable for people and businesses, and can be used to solve more important problems. By designing more of the stack ourselves, we can serve more intelligence with greater efficiency and keep pushing advanced AI toward broader access."
First step in a multi-generation platform
Jalapeño is the first accelerator in a roadmap that combines OpenAI-designed silicon with Broadcom's networking and connectivity technologies and Celestica's manufacturing expertise. The platform targets initial deployment by the end of 2026 and will scale over multiple generations. Broadcom President and CEO Hock Tan described the partnership as "a fundamental commitment to scaling the physical infrastructure required for the next decade of AI" and noted that co-developing silicon directly with OpenAI allows gigawatt-scale data center deployments with Microsoft and other partners.
Why this matters for IT and Development
Inference is the moment AI reaches users-every ChatGPT response, every Codex task, every API call. Jalapeño's design focuses on combining high throughput with low latency, making it suited for interactive LLM products at scale. For developers building on OpenAI's APIs, a more efficient inference platform could translate into lower per-request costs and faster response times. For IT teams planning infrastructure, the move toward custom accelerators optimized for LLM workloads signals a shift in how data center hardware will be procured and managed. If OpenAI's claim of substantially better performance per watt holds, organizations running large-scale AI inference could see meaningful reductions in power and cooling demands. The nine-month tape-out also hints at a faster hardware iteration cycle that could accelerate the availability of more efficient compute across the industry.
Your membership also unlocks: