EnvGPT Sets New Standard for Domain-Specific AI in Environmental Science
EnvGPT integrates EnvInstruct, ChatEnv, and EnvBench to train and evaluate environmental science LLMs efficiently. It matches larger models in accuracy using a focused, domain-specific approach.

EnvGPT Framework: From Data to Benchmark
The EnvGPT development pipeline is built around three core components that streamline training and evaluation for environmental science language models. These components include EnvInstruct, a system that generates instruction sets; ChatEnv, a 100-million-token domain-specific dataset; and EnvBench, a comprehensive benchmark designed for assessing large language models (LLMs) within environmental science.
Together, these tools create an efficient and reproducible pipeline that supports rigorous model training and performance testing across multiple environmental disciplines.
Addressing Challenges in Environmental Science AI
Environmental science covers diverse fields such as ecology, hydrology, and climate science, each with its own specialized terminology and data types. While general-purpose LLMs have made advances in fields like medicine and law, they often fall short in environmental applications due to limited exposure to domain-specific data.
Previous models like ClimateGPT and WaterGPT targeted narrow subdomains, lacking a unified approach that spans the full breadth of environmental science. This gap highlights the need for integrated frameworks that both generate high-quality environmental data and offer robust evaluation methods for LLMs.
The EnvGPT Pipeline
Published in Environmental Science and Ecotechnology on August 1, 2025, the EnvGPT framework was developed by researchers from Southern University of Science and Technology and Tsinghua University. It consists of:
- EnvInstruct: A multi-agent GPT-4 system that generated 112,946 instruction–response pairs to guide model training.
- ChatEnv: A balanced dataset created from open-access environmental journals, comprising 100 million tokens spanning five key environmental themes.
- EnvBench: A 4,998-item benchmark designed to rigorously evaluate LLMs across multiple environmental topics.
The researchers assembled EnvCorpus from open scientific literature, ensuring comprehensive coverage of essential environmental areas. EnvGPT was then fine-tuned using low-rank adaptation (LoRA), which significantly reduced computational costs without sacrificing performance.
Performance Highlights
On the EnvBench benchmark, EnvGPT outperformed models of similar size such as LLAMA-3.1-8B and Vicuna-1.5-7B. Remarkably, it matched the factual accuracy and relevance of much larger models including Qwen2.5-72B and the closed-source GPT-4o-mini.
EnvGPT achieved 92.06% accuracy on the EnviroExam, a university-level multiple-choice test, outperforming baseline models by approximately 8 percentage points. The model also demonstrated strong real-world applicability, excelling in interdisciplinary and complex reasoning tasks validated by the ELLE dataset.
Implications and Future Directions
This work shows how focused fine-tuning on domain-specific data can enable compact models to compete with larger counterparts in environmental science applications. EnvGPT offers reliable, domain-aware responses to complex environmental questions, assisting researchers, educators, and policymakers alike.
The open release of ChatEnv and EnvBench promotes reproducible research and invites community contributions for continuous improvement. Future enhancements may include integrating retrieval-augmented generation and multimodal data to support real-time reasoning and keep pace with evolving scientific knowledge.
Funding and Publication
This research was supported by the National Key Research and Development Program of China (2024YFC3711800) and the High-level University Special Fund (G03050K001).
Environmental Science and Ecotechnology (ISSN 2666-4984) is a peer-reviewed, open-access journal published by Elsevier. It covers a broad spectrum of environmental science topics such as climate change, biodiversity conservation, and AI-driven environmental engineering. The journal’s latest impact factor is 14.3, according to the 2024 Journal Citation Reports™.