AI Reasoning Models Emit Up to 50 Times More CO₂ Than Standard LLMs, Study Finds
Advanced AI reasoning models emit up to 50 times more CO₂ than concise LLMs answering the same questions. This trade-off raises concerns about sustainability in AI use.

Advanced AI Models Emit Up to 50 Times More CO₂ Than Common LLMs When Answering the Same Questions
Efforts to improve AI accuracy come with a significant environmental cost. Recent research reveals that advanced reasoning models—those designed for tasks in algebra, philosophy, and other complex fields—can produce up to 50 times more carbon dioxide emissions than more concise large language models (LLMs) when responding to identical prompts.
These specialized models, including Anthropic's Claude, OpenAI's o3, and DeepSeek's R1, invest more computational resources and time to generate accurate answers. However, their enhanced reasoning capabilities lead to a much larger carbon footprint, raising concerns about sustainability in AI deployment.
The Energy Behind AI Reasoning
LLMs process language by breaking it down into tokens—small word chunks converted into numerical data. These tokens pass through neural networks trained on vast datasets to predict and generate responses. Reasoning models take this further by employing a "chain-of-thought" approach. This technique decomposes complex questions into smaller, logical steps, mimicking human problem-solving.
While this method improves accuracy, it also demands significantly more energy. The increased computational load results in higher carbon emissions, which may present economic and environmental challenges for organizations using these models.
Quantifying the Carbon Cost
To measure the impact, researchers tested 14 LLMs with parameters ranging from 7 to 72 billion on 1,000 questions covering various topics. Using the Perun framework on an NVIDIA A100 GPU, they tracked energy consumption and converted this into CO₂ emissions, assuming 480 grams of CO₂ per kilowatt-hour.
Results showed reasoning models generated an average of 543.5 tokens per question, compared to just 37.7 tokens for concise models. This token difference translated directly into higher emissions. For example, the most accurate model tested, Cogito with 72 billion parameters, delivered 84.9% accuracy but produced three times the emissions of similarly sized, less verbose models.
- Models with emissions under 500 grams of CO₂ equivalent did not exceed 80% accuracy on the test questions.
- Questions requiring longer reasoning, like those in algebra or philosophy, caused emissions to spike sixfold compared to straightforward queries.
- Model choice greatly affected emissions: answering 60,000 questions with DeepSeek's R1 model would emit as much CO₂ as a round-trip flight between New York and London, while Alibaba Cloud's Qwen 2.5 model could achieve similar accuracy with just one-third of those emissions.
Implications for AI Deployment
The findings highlight a clear trade-off between AI accuracy and environmental impact. Users and organizations should weigh the benefits of advanced reasoning models against their carbon footprint, especially when deploying AI at scale.
Since emissions vary with hardware and energy sources, the study encourages transparency around the environmental cost of AI outputs. Being aware of these costs can lead to more thoughtful use of AI technologies, potentially reducing unnecessary energy consumption.
For those interested in practical AI skills and understanding the balance between performance and sustainability, exploring targeted AI courses can provide valuable insights.