Amazon's token leaderboards produce gaming behaviour instead of AI productivity gains

Amazon's tokenmaxxing scandal reveals the real cost of measuring AI adoption instead of AI value

Amazon employees are running unnecessary tasks through an internal AI tool not because the work needs doing, but because the activity inflates their scores on a company leaderboard tracking artificial intelligence usage. The practice, which workers call "tokenmaxxing," exposes what happens when a productivity metric becomes a target: the metric stops measuring productivity and starts measuring the human capacity for gaming.

The Financial Times reported the practice this week. Since then, it has spread as a case study across the technology industry. For HR leaders currently designing or implementing AI adoption programs, it is a live demonstration of a failure mode that is easy to build and hard to reverse.

What Amazon built

Amazon deployed MeshClaw, an in-house AI product that allows employees to create software agents capable of connecting to workplace tools and completing tasks on a user's behalf. The bot can initiate code deployments, triage emails and interact with applications including Slack.

The company positioned it as empowering "thousands of Amazonians to automate repetitive tasks each day." More than three dozen Amazon engineers worked on the tool.

Amazon then introduced targets requiring more than 80 per cent of its developer workforce to use AI tools each week. The company began tracking token consumption - the units of data processed by AI models, essentially a meter of how much the tools are being run - on internal leaderboards. Team-wide statistics were initially visible to all staff before being restricted so only employees and their managers could view them.

The predictable result

"There is just so much pressure to use these tools," one Amazon employee told the Financial Times. "Some people are just using MeshClaw to maximise their token usage."

Another said the data was being watched regardless of official policy: "Managers are looking at it. When they track usage it creates perverse incentives and some people are very competitive about it."

Amazon told staff that token statistics would not be used in performance evaluations. Workers did not believe it.

The pattern spreads

Amazon is not alone. Meta employees engaged in similar behaviour, competing on an internal leaderboard called "Claudeonomics" that ranked the company's roughly 85,000 workers by token consumption. In a 30-day window, total usage exceeded 60 trillion tokens. The leaderboard was taken down after reporting, but Meta's CTO Andrew Bosworth publicly endorsed the underlying logic - pointing to his best engineer spending the equivalent of their salary in AI tokens as evidence of a productivity multiplier.

At Microsoft, a senior leader sent an internal memo stating AI use was "no longer optional, it's core to every role and every level." A company spokesperson later clarified there was "no formal review of an employee's AI usage" - the kind of clarification issued when an original message lands harder than intended.

A May 2026 CNBC report noted that "almost every Fortune 500 is tracking overall AI usage," with tokens, prompt counts, licence activations and seat-utilisation rates becoming standard surveillance inputs alongside older metrics like badge-swipe and keyboard activity.

The financial pressure behind the numbers

The financial stakes are enormous. Combined 2026 capital expenditure from Amazon, Microsoft, Alphabet and Meta is tracking between $650 billion and $700 billion, with some Wall Street projections exceeding $1 trillion for 2027. Every executive who has made those commitments has an investor relations problem if adoption numbers look weak.

Token counts are the answer - unless employees are manufacturing the counts themselves.

This is a people management problem, not a technology problem

The Amazon story is being described as a textbook case of Goodhart's Law: the principle that when a measure becomes a target, it ceases to be a good measure. The moment token consumption was tied to leaderboards that managers could see, it stopped measuring AI productivity and started measuring competitive anxiety.

HR leaders designed this. Not maliciously - but the incentive structure that produced tokenmaxxing is a people management structure, not a technology one. Weekly usage targets, visible leaderboards, ambiguous signals about whether the numbers feed into performance reviews: these are HR design choices, and they have produced a predictable human response.

Only 4 per cent of employers see employee resistance as a barrier to AI adoption. Yet nearly a quarter of workers say they would consider leaving a job if forced to use AI tools in ways they did not support. The gap between those two numbers describes the same dynamic playing out at Amazon: employees complying visibly and resisting quietly. Tokenmaxxing is simply a more industrious version of that quiet resistance.

The Australian legal dimension

Under the Fair Work Act, employees have rights to consultation on major workplace changes, including changes to the way work is performed. The deployment of AI tools that materially alter how performance is measured - including through token consumption leaderboards - may trigger those consultation obligations, whether or not organisations have thought of them in those terms. Modern awards and enterprise agreements extend those duties further.

A parliamentary inquiry into workplace digital transformation has already recommended that AI systems used in employment-related decisions be classified as high-risk, with stronger requirements around consultation, transparency and bias auditing. If a token consumption leaderboard is informing - even informally - decisions about who is performing and who is not, it sits in that high-risk category.

The fact that Amazon told employees their token data would not feed into reviews, while workers widely disbelieved it, is precisely the transparency failure that regulators are preparing to act on.

Australian CHROs now need to audit their AI-in-people-processes footprint for Fair Work exposure - identifying every place where AI is informing decisions about the workforce and assessing both the human oversight in place and whether consultation obligations have been met.

The measurement problem is also a security problem

Multiple Amazon employees told the Financial Times they were alarmed by the security profile of MeshClaw itself. The tool was granted permission to act on a user's behalf - initiating code deployments, interacting with internal systems, sending communications. One employee said: "The default security posture terrifies me. I'm not about to let it go off and just do its own thing."

An AI agent that employees are running on unnecessary tasks to inflate usage scores is an agent taking real actions in real systems. It creates code deployments that did not need to happen. It sends emails that did not need to be sent. The perverse incentive structure does not just produce misleading productivity data; it produces real operational noise.

Research found that for every 10 hours of efficiency gained through AI, nearly four are lost correcting, clarifying or rewriting AI-generated content. Add leaderboard pressure that rewards running AI more rather than running it better, and that rework figure compounds further.

What HR leaders should do now

The Amazon episode arrives at a moment when CEOs are facing board pressure to deliver measurable AI-driven outcomes. That pressure flows downstream to people teams through KPIs, adoption targets and the implicit understanding that usage statistics will be scrutinised. That pressure is not going away. But the way it is currently being transmitted into the workforce is producing the opposite of what it intends.

Measuring usage is not measuring value. Token counts, weekly active users and seat-utilisation rates tell you whether employees are running the tools. They tell you nothing about whether the tools are producing better work.

Only 2.7 per cent of the Australian-comparable workforce qualify as genuine "AI practitioners" - people who have embedded AI into their workflows and are seeing significant productivity gains. The remaining 97 per cent are using AI in shallow, low-value ways. A token consumption leaderboard does nothing to change that. It may make it worse.

Leaderboards drive performance theatre, not performance. The mechanism at Amazon is structurally identical to the Wells Fargo fake accounts scandal - aggressive targets tied to evaluation producing the appearance of success regardless of whether anything of value was delivered. HR can see this coming. It is much harder to reverse once the behaviour is embedded.

The ambiguity about whether metrics feed into reviews is the problem, not the solution. Amazon told employees that token statistics would not inform performance evaluations. Workers did not believe it, and behaved accordingly. Employees act on what they believe management is watching, not on what policy documents say. If there is any possibility that a metric feeds into decisions about someone's career, they will optimise for it.

Transparency is the variable that changes behaviour. The organisations performing best on AI adoption are those in which employees understand why the technology is being deployed and what it is expected to produce - not those with the most aggressive targets.

Regulators are watching. Australia's regulatory trajectory on workplace AI is clearly moving towards greater transparency, consultation and accountability. An organisation that builds tokenmaxxing-style incentive structures now, before mandatory consultation requirements are in place, is creating exactly the kind of paper trail that will be uncomfortable when those requirements arrive.

Amazon spent $200 billion this year to make AI central to how its employees work. The tokenmaxxing problem did not cost a dollar to build. It came free, with the leaderboard.

For HR professionals navigating these decisions, AI for CHROs provides frameworks for designing adoption programs that measure value rather than activity, and AI for HR Managers covers the practical implementation challenges that produce these kinds of perverse incentives.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Amazon's token leaderboards produce gaming behaviour instead of AI productivity gains

Amazon's tokenmaxxing scandal reveals the real cost of measuring AI adoption instead of AI value

What Amazon built

The predictable result

The pattern spreads

The financial pressure behind the numbers

This is a people management problem, not a technology problem

The Australian legal dimension

The measurement problem is also a security problem

What HR leaders should do now

Related AI News for Human Resources

Bank of Canada says AI is reshaping work but not yet replacing workers

Cisco cuts 4,000 jobs despite posting record quarterly revenue of $15.8 billion

Most Gen Z workers trust AI for benefits decisions, Hartford study finds

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company:

Amazon's token leaderboards produce gaming behaviour instead of AI productivity gains

Amazon's tokenmaxxing scandal reveals the real cost of measuring AI adoption instead of AI value

What Amazon built

The predictable result

The pattern spreads

The financial pressure behind the numbers

This is a people management problem, not a technology problem

The Australian legal dimension

The measurement problem is also a security problem

What HR leaders should do now

Related AI News for Human Resources

Employers and employees share responsibility for AI training and cybersecurity, research finds

Bank of Canada says AI is reshaping work but not yet replacing workers

Cisco cuts 4,000 jobs despite posting record quarterly revenue of $15.8 billion

Most Gen Z workers trust AI for benefits decisions, Hartford study finds