AI energy use: New tools show which model consumes the most electricity-and why
February 23, 2026
AI's energy bill is no longer guesswork. Open-source software and an online leaderboard from the University of Michigan now let users and developers measure the electricity different AI models consume for common tasks-chat, image and video generation, problem solving, and coding.
Teams can run the software on their own hardware to evaluate private and open-weight models. While it can't measure queries sent to proprietary services inside private data centers, it enables apples-to-apples comparisons for open-weight models where parameters are publicly available.
Why this matters
Most of AI's electricity use-roughly 80% to 90%-happens during inference, not training. As models get larger and usage grows, the strain scales with it. In 2024, U.S. data centers consumed about 4% of the nation's electricity, with demand projected to roughly double by 2030.
Despite the stakes, popular AI benchmarks don't report energy use. Instead, rough "envelope" estimates multiply a GPU's maximum power draw by the number of GPUs-useful for a ceiling, but disconnected from how models actually run.
"If you want to optimize energy efficiency and minimize environmental impact, knowing the energy requirements of the models is critical, but popular benchmarks for assessing AI ignore this aspect of performance," said Mosharaf Chowdhury, associate professor of computer science and engineering.
What the Michigan team built
The group developed open-source measurement tools and an online leaderboard that captures model-by-model energy use on real tasks. Their latest leaderboard update surfaced wide gaps in consumption: for some tasks, open-weight models differ by up to 300x in energy required.
The team has also produced tutorials to help practitioners measure and reduce energy costs, including material presented at the NeurIPS Conference.
Key finding: tokens drive the bill
A core driver of energy use is the number of generated tokens. Large language models that generate wordier outputs burn more electricity than concise ones. Reasoning-focused models also consume more because they produce longer "chains of thought," often 10-100x more tokens per request.
How a model is run matters too. Batching queries reduces total data center energy, though larger batches increase latency. Even the choice of memory allocation software can change a model's energy footprint.
"There are many ways to deploy AI and translate what the model wants to do into computations on the hardware," said Jae-Won Chung, doctoral student and first author. "Our tool can automate the search through that parameter space and find the most efficient set of parameters based on the user's needs."
From estimates to measurements
"A lot of people are concerned about AI's growing energy use, which is fair," said Chowdhury. "However, many who worry can be overly pessimistic, and those who want more data centers are often overly optimistic. The reality is not black and white, and there's a lot we don't know because nobody is making direct measurements of AI power use available. Our tool can provide more accurate data for better decision-making."
In other words: move away from theoretical ceilings and measure what your stack actually does under load.
Practical steps for researchers and developers
- Measure on your hardware: Use energy measurement tools to profile your models on representative workloads. Capture per-request kWh across chatting, coding, and generation tasks.
- Control tokens: Cap max tokens and prefer concise decoding strategies where feasible. If your use case doesn't need long reasoning traces, limit chain-of-thought generation.
- Tune batching intentionally: Increase batch size to reduce energy per request when latency budgets allow. Validate the throughput/latency/energy trade-off with real traffic.
- Test deployment parameters: Evaluate different memory allocators and runtime settings. Small scheduling and memory decisions can shift energy use meaningfully.
- Report energy alongside accuracy: When benchmarking, include energy per task and per token. Track changes over time as models, prompts, and configs evolve.
What we still don't know
The picture remains incomplete for proprietary models running in private data centers, where direct energy data isn't reported. As demand grows, better transparency and standardized reporting will help calibrate policy, procurement, and infrastructure planning.
People and place
The work is led by Mosharaf Chowdhury and his team, including Jae-Won Chung, at U-M's Michigan Academic Computing Center, a two-megawatt facility used for academic research in Ann Arbor, Michigan.
Funding and support
The project received partial support from the National Science Foundation, with additional grants and gifts from VMware, the Mozilla Foundation, Cisco, Ford, GitHub, Salesforce, Google, and the Kwanjeong Educational Foundation.
Further resources
Your membership also unlocks: