Google builds inference chips with Marvell to compete with Nvidia
Google is developing AI inference chips in partnership with Marvell Technology, according to Bloomberg. The company plans to announce new tensor processing units (TPUs) at Google Cloud Next in Las Vegas this week, with inference-focused chips expected to follow.
Inference is where trained AI models do their actual work-answering queries and generating outputs. The shift matters: as AI spending grows, the bottleneck is moving from training models to running them at scale.
"The battleground is shifting towards inference," said Gartner analyst Chirag Dekate.
Google Chief Scientist Jeff Dean said the company now sees reason to specialize chips more narrowly. "As AI demand grows, it now becomes sensible to specialize chips more for training or more for inference workloads," he said.
Why Google has structural advantages
Google's move into inference chips builds on years of in-house semiconductor development. The company controls both the AI models and the hardware running them-a tight feedback loop no other major AI developer matches at comparable scale.
Google also has revenue from search to fund chip development and can steer TPU access to its own priorities. Meta recently signed a multibillion-dollar deal to procure TPUs through Google Cloud, citing possible performance gains on inference tasks.
Anthropic expanded its TPU access to as much as 1 million chips and separately signed a deal with Broadcom, Google's TPU manufacturing partner, for chips providing roughly 3.5 gigawatts of computing power starting in 2027.
Google has begun letting enterprise customers deploy TPU hardware on-premises rather than relying solely on cloud infrastructure, and opened TPU access to outside tools like PyTorch.
Supply constraints and Nvidia's edge
Supply remains a real problem. Chip scarcity limits access for many companies, and available TPU supply is being steered toward major AI organizations.
Nvidia still leads in AI chips, particularly for training. Nvidia CEO Jensen Huang said at the company's GTC conference that its chips can handle applications "you can't do with TPUs." Google itself uses both TPUs and Nvidia GPUs for its own AI work.
For IT and development teams, understanding the shift toward inference matters: it affects where workloads run, how infrastructure is sized, and which hardware investments make sense. Learn more about AI for IT & Development and how infrastructure choices impact your stack. For those working with large language models, Generative AI and LLM resources cover the practical side of deployment.
Your membership also unlocks: