Amazon's AI leaderboard drives employees to run pointless tasks to inflate usage scores

Amazon's "Tokenmaxxing" Scandal Exposes a Fundamental HR Design Problem

Amazon employees are running unnecessary tasks through an internal AI tool to inflate their scores on a company leaderboard tracking AI usage. The practice, called "tokenmaxxing," was reported by the Financial Times and has since become a case study across the technology industry. It demonstrates what happens when a productivity metric becomes a target: the metric stops measuring productivity and starts measuring the human capacity for gaming.

The episode matters to HR leaders designing or implementing AI adoption programs. It is a live demonstration of a failure mode that is easy to build and hard to reverse.

What Amazon Built, and What Went Wrong

Amazon deployed MeshClaw, an in-house AI tool that allows employees to create software agents capable of connecting to workplace systems and completing tasks. The bot can initiate code deployments, triage emails, and interact with applications including Slack. More than three dozen engineers worked on the tool.

Amazon also introduced targets requiring more than 80 percent of its developer workforce to use AI tools each week, and began tracking token consumption - the units of data processed by AI models - on internal leaderboards. Team-wide statistics were initially visible to all staff before being restricted.

The result was predictable. "There is just so much pressure to use these tools," one Amazon employee told the Financial Times. "Some people are just using MeshClaw to maximise their token usage." Another said managers were watching the data regardless of official policy. Amazon told the Financial Times that token statistics would not be used in performance evaluations. Workers did not believe it.

The Pattern Across Silicon Valley

Amazon is not alone. Meta employees engaged in similar tokenmaxxing behaviour, competing on an internal leaderboard called "Claudeonomics" that ranked roughly 85,000 workers by token consumption. In a 30-day window, total usage exceeded 60 trillion tokens. The leaderboard was taken down after reporting by The Information, but Meta's CTO publicly endorsed the underlying logic.

At Microsoft, the president sent an internal memo saying AI use was "no longer optional, it's core to every role and every level." A company spokesperson later clarified there was "no formal review of an employee's AI usage" - the kind of clarification issued when the original message lands harder than intended.

Almost every Fortune 500 company is now tracking overall AI usage, with tokens, prompt counts, license activations, and seat-utilisation rates becoming standard surveillance inputs. The financial stakes are staggering. Combined 2026 capital expenditure from Amazon, Microsoft, Alphabet, and Meta is tracking between $650 billion and $700 billion. Every executive leading a company that has made those commitments has an investor relations problem if adoption numbers look weak.

This Is an HR Design Problem

The Amazon story exemplifies Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. The moment token consumption was tied to leaderboards that managers could see, it stopped measuring AI productivity and started measuring competitive anxiety. HR leaders designed this - not maliciously, but the incentive structure that produced tokenmaxxing is a people management structure, not a technology one.

Weekly usage targets, visible leaderboards, and ambiguous signals about whether the numbers feed into performance reviews are HR design choices. They have produced a predictable human response.

Executives believe refusing to adopt AI is a greater threat to someone's job than AI itself. In that environment, an employee confronted with a leaderboard and an 80 percent usage target is not making a free choice about whether to adopt the technology. They are responding to a threat.

Research shows that only four percent of employers report employee resistance as a barrier to AI adoption - yet nearly a quarter of workers said they would consider leaving a job if forced to use AI tools in ways they did not support. The gap between those two numbers describes the same dynamic: employees are complying visibly and resisting quietly. Tokenmaxxing is simply a more industrious version of that quiet resistance.

The Measurement Problem Is Also a Security Problem

Multiple Amazon employees told the Financial Times they were alarmed by the security profile of MeshClaw itself. The tool was granted permission to act on a user's behalf - initiating code deployments, interacting with internal systems, sending communications. One employee said: "The default security posture terrifies me. I'm not about to let it go off and just do its own thing."

This concern sits alongside the gaming problem rather than beneath it. An AI agent that employees are running on unnecessary tasks to inflate usage scores is an agent taking real actions in real systems - creating code deployments that did not need to happen, sending emails that did not need to be sent.

Seventy percent of managers observed at least one AI-related error from a direct report in the previous 12 months. Add leaderboard pressure that rewards running AI more, not running it better, and the error rate compounds. Even among companies seeing productivity gains from AI, roughly 37 percent of time saved is being consumed by rework - for every 10 hours gained, nearly four are lost correcting AI outputs.

What HR Should Do Differently

Measuring usage is not measuring value. Token counts, weekly active users, and seat-utilisation rates tell you whether employees are running the tools. They tell you nothing about whether the tools are producing better work. If the KPI is token consumption, rework is invisible.

Leaderboards drive performance theater, not performance. The Wells Fargo fake accounts scandal was a leaderboard problem before it was a compliance problem. Aggressive sales targets tied to evaluation produced the appearance of cross-selling success regardless of whether customers wanted the products. The mechanism at Amazon is structurally identical, scaled to AI and played out in tokens rather than accounts.

Ambiguity about whether metrics feed into reviews is the problem, not the solution. Amazon told employees that token statistics would not inform performance evaluations. Employees did not believe it and behaved accordingly. Employees act on what they believe management is watching, not on what policy documents say. If there is any possibility that a metric feeds into decisions about someone's career, they will optimise for it.

Transparency changes behaviour. Ninety-two percent of desk workers in organisations with a clearly communicated AI strategy reported productivity gains. The companies performing best on AI adoption are not the ones with the most aggressive targets - they are the ones in which employees understand why the technology is being deployed and what it is expected to produce.

More managers than ever believe AI can replace their direct reports. The share who agreed that replacing employees with AI tools was a good thing rose from 23 percent in 2025 to 35 percent in 2026. In that environment, an employee who sees a leaderboard tracking their AI usage is not imagining the threat.

The HR response to tokenmaxxing is not to take down the leaderboard and move on. It is to ask what the leaderboard communicated about what the organisation values - and whether that is actually what you want employees to believe. Amazon spent $200 billion this year to make AI central to how its employees work. The tokenmaxxing problem came free, with the leaderboard.

For HR leaders implementing AI adoption programs, consider reviewing the AI for HR Managers learning path to understand how to design metrics that measure outcomes rather than activity.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Amazon's AI leaderboard drives employees to run pointless tasks to inflate usage scores

Amazon's "Tokenmaxxing" Scandal Exposes a Fundamental HR Design Problem

What Amazon Built, and What Went Wrong

The Pattern Across Silicon Valley

This Is an HR Design Problem

The Measurement Problem Is Also a Security Problem

What HR Should Do Differently

Related AI News for Human Resources

Employers increasingly use AI to decide who gets laid off, survey finds

Amazon's AI leaderboard drives employees to run pointless tasks to inflate usage scores

HR leaders split on how to handle AI use by candidates in job interviews

Malaysia must boost AI readiness and talent development to meet future job demands, says Ramanan

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: