Amazon Kills Internal AI Leaderboard After Staff Gamed the System
Amazon has removed an internal AI usage leaderboard after employees inflated token consumption on low-value tasks, forcing the company to rethink how it measures AI adoption. The initiative, called Kirorank, ranked staff based on how often they used AI tools. Instead of driving meaningful adoption, it created perverse incentives that wasted computing resources.
Dave Treadwell, Senior Vice President at Amazon, told staff the leaderboard encouraged "tokenmaxxing" - inflating AI token consumption regardless of business value. "Please do not use AI just for the sake of using AI," he said.
Tokens are units of data processed by AI models. Each meaningless task consumes computing capacity Amazon must pay for, turning the scoreboard into overhead rather than a signal of useful work.
How the Incentive Failed
Employees reportedly assigned AI agents - autonomous bots that act on a user's behalf - to pointless tasks to climb the rankings. Some used internal tools like Kiro and MeshClaw to generate additional AI activity without shipping real code or delivering customer value.
Amazon had set targets requiring more than 80% of developers to use AI weekly. Without clear links to business outcomes, the metric invited activity that looked productive but created little value.
The stakes matter. Amazon expects to spend roughly $200 billion in capital expenditure, mostly on AI and data centre infrastructure. Rising compute costs make token waste expensive. Anthropic, whose models Amazon uses extensively, shifted from flat monthly fees to metered usage, increasing bills for heavy users.
Meta employees attempted similar gaming of internal tables, according to reporting on the issue.
The Replacement: Code Over Tokens
Amazon now tracks "normalised deployments" - evidence that engineers regularly use AI to create useful code that ships to production. Treadwell instructed staff to focus on building better products and shipping improvements customers notice, not on burning tokens.
Other leaders are adopting the same view. Ravi Kumar S, CEO at Cognizant, called token consumption a "vanity metric," saying the company measures results over usage.
Measuring deployments rewards teams for merging AI-assisted code into production rather than running experiments that never ship. It encourages thoughtful integration of AI into the software lifecycle.
What This Means for Managers and HR Leaders
This is a textbook incentive-design failure. Mandate a behaviour, attach a public scoreboard, and people will deliver the number whether or not it creates value.
The fix is straightforward: measure the outcome the business actually wants. In this case, that meant working code rather than tokens burned. When incentives align with real business goals, the pressure to game the system disappears.
Three principles emerge for AI for Management teams adopting new tools:
- Define high-value use cases. Be specific about where AI should solve problems, not just where it can be used.
- Align targets with delivery milestones. Tie adoption metrics to code shipped, features launched, or problems solved - not activity metrics.
- Validate impact after deployment. Measure whether AI-assisted work actually improves speed, quality, or customer outcomes.
Clear ownership matters too. If dashboards drive behaviour, they must be approved, audited, and linked to goals that affect customers and the bottom line. Unofficial tools can drift from leadership intent, as Amazon's experience shows.
AI for Human Resources teams should note the broader lesson: adoption quality beats adoption quantity. A smaller group using AI thoughtfully on high-impact work creates more value than widespread usage on marginal tasks.
Amazon's shift from Kirorank to deployment tracking reflects a wider industry move toward outcomes-based measurement. As AI becomes infrastructure rather than novelty, the pressure to adopt matters less than the discipline to adopt well.
Your membership also unlocks: