Companies Cut AI Spending by Routing Tasks to Cheaper Models
Large enterprises are shifting away from running all queries through their most expensive AI models. Instead, they're matching each task to the right tool-a practice called model routing-to control spiraling AI costs that have blown past budgets.
The change threatens the business model of OpenAI and Anthropic, whose valuations depend on sustained demand for premium-priced models across the board.
The math behind the shift
For the past two years, companies defaulted to frontier models regardless of task complexity. Now, with AI bills running far ahead of forecasts, chief financial officers and boards are demanding efficiency.
Cisco's chief product officer Jeetu Patel laid out the numbers. At roughly $200 of token usage per employee per week, that's about $10,000 annually per person. Across Cisco's 90,000 employees, the annual bill reaches $900 million.
Cisco came in well over budget and has had to adjust. The company is now prioritizing tokens over other spending while 30,000 engineers build products largely with AI.
How routing works and what it saves
Model routing sends difficult problems to expensive frontier models and simple ones to cheaper alternatives. The cost savings are substantial.
Scott Wu, CEO of Cognition (which makes the coding agent Devin), said companies can achieve five to ten times better cost efficiency on routine work using models that are adequate for the task.
Consider a basic question: Who was the third U.S. president? Every model, expensive or cheap, returns the same answer-Thomas Jefferson. Yet most enterprises still send such queries to their most expensive option.
Arvind Jain, CEO of Glean, estimates that roughly 95% of enterprise AI usage still runs on the most expensive frontier models, even for tasks cheaper alternatives could handle.
Vendors respond to cost pressure
AI companies recognize the anxiety. Cognition announced an "AI productivity guarantee" that refunds customers if Devin delivers less engineering value than they're paying for-up to $10 million in usage until performance matches expectations.
Wu framed this as addressing a metric that has plagued the industry: return on investment. Rather than measuring activity like tokens consumed or lines of code written, Cognition estimates the actual engineering hours its agent saves and backs that estimate with a refund.
"You can spend billions of tokens and be doing nothing with it," Wu said. Companies should pursue output, not activity.
The valuation question
If enterprises steer high-volume, easy work to cheaper open-source models, OpenAI and Anthropic stop getting paid for every task. They only earn revenue on complex jobs.
Both companies have built their businesses and IPO expectations around enormous demand at premium prices. That assumption is now in question.
Patel doesn't think model routing sinks the frontier labs. Cutting-edge technology will retain value, he said. But he sees the pricing model shifting. The labs will need to get more efficient rather than simply charge more-a change Patel predicts will become an industry-wide effort.
Frontier models will still command a premium for the hardest work. The open question is how much of the market consists of easier tasks. The answer could significantly affect the valuations of leading AI companies.
For finance leaders managing AI budgets, this shift represents an opportunity to align spending with actual business value. AI Learning Path for CFOs covers cost optimization strategies in enterprise AI deployments.
Your membership also unlocks: