Ex-Googlers' MatX Lands $500M to Ship High-Throughput, Low-Latency LLM Training Chip in 2027

MatX raised $500M to build MatX One, a high-throughput, low-latency LLM chip with a splittable systolic array and SRAM+HBM memory. Built at TSMC, first units land in 2027.

Categorized in: AI News IT and Development
Published on: Feb 27, 2026
Ex-Googlers' MatX Lands $500M to Ship High-Throughput, Low-Latency LLM Training Chip in 2027

MatX raises $500M to build a faster, lower-latency LLM training chip

February 26, 2026

AI chip startup MatX closed a $500 million Series B led by Situational Awareness, the fund founded by former OpenAI researcher Leopold Aschenbrenner, with Jane Street co-leading. Additional backers include Spark Capital, Triatomic Capital, Harpoon, Alchip Technologies, and Marvell.

Founded in 2024 by former Google engineers Reiner Pope and Mike Gunter, MatX is focused on processors purpose-built for large language models. At Google, Pope worked on AI software while Gunter built hardware-experience that now converges in MatX's first product, the MatX One.

What MatX One promises

In a LinkedIn post announcing the round, Pope said MatX One will deliver "much higher throughput than any other chip while also achieving the lowest latency." The company attributes this to a splittable systolic array architecture that partitions compute into smaller arrays for efficiency.

Pope also highlighted a design that combines the low latency of SRAM-first approaches with the long-context support of HBM, plus a "fresh take on numerics." According to him, this yields higher throughput on LLMs than any announced system while matching the latency of SRAM-first designs-a formula aimed at "smarter and faster models for your subscription dollar."

Production and timeline

Per reporting from TechCrunch, MatX will manufacture with TSMC and targets initial shipments in 2027.

Why this matters for IT and development teams

  • Throughput vs. latency: Higher tokens-per-second can shrink training cycles and speed up evaluation. Lower latency helps with inference responsiveness and control-plane operations. If both claims hold, you can iterate faster and run more experiments per dollar.
  • Architecture nuance: A splittable systolic array could improve utilization across varied batch sizes and sequence lengths. That may help stabilize performance as workloads shift between pretraining, SFT, and inference.
  • Memory choices: Blending SRAM-first behavior with HBM suggests a push for both snappy kernel execution and longer context windows. Useful if you're training or serving models with expanding context requirements.
  • Numerics: "Fresh" number formats typically mean more aggressive mixed precision. Plan for side-by-side quality validation (eval suites, calibration passes, and regression checks) before committing large training budgets.
  • Software and tooling: Ask about compiler maturity, kernel libraries, framework support (PyTorch, JAX/XLA), and graph-level optimizations. Early silicon lives or dies by the stack.
  • Scaling: Clarify interconnect, cluster topology, and collective comms performance. Training stability at scale matters as much as single-chip FLOPs.
  • Procurement timing: With 2027 as the shipping target, treat 2026 as your diligence window. Line up POCs, set performance gates, and compare TCO against your incumbent accelerators.

What to watch next

Expect more detail on sustained throughput, latency under real LLM kernels, memory bandwidth, interconnect, and power efficiency. Independent benchmarks and early customer case studies will tell you how these chips behave off the slide deck.

If you're mapping out AI infra strategy or evaluating training hardware, explore AI for IT & Development for practical guides and decision frameworks.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)