AI Matches Human Experts in Retrieving Conservation Evidence, Study Finds

AI matched human experts in retrieving conservation evidence with up to 97.8% accuracy using hybrid search methods. This boosts efficiency and supports better biodiversity decisions.

Categorized in: AI News Science and Research
Published on: Jun 17, 2025
AI Matches Human Experts in Retrieving Conservation Evidence, Study Finds

AI Matches Human Experts in Retrieving Conservation Evidence

Researchers from Imperial College London and Cambridge University have demonstrated that artificial intelligence (AI) can perform on par with human experts when retrieving conservation evidence. This finding addresses a critical challenge in biodiversity conservation: despite abundant scientific research, evidence often fails to translate effectively into conservation practice.

The Challenge of Conservation Evidence

Conservation efforts sometimes rely on interventions with limited evidence of effectiveness. Take bat gantries in the UK—structures designed to help bats avoid road traffic. Though costing over £1 million, studies show these gantries largely fail to meet their goals. Similarly, some measures under the European Union’s Common Agricultural Policy have not delivered expected biodiversity benefits despite scientific recommendations for better alternatives.

A major hurdle is that conservation knowledge is dispersed across countless studies, making it difficult for practitioners to quickly find relevant, trustworthy information. While databases like Conservation Evidence gather this data, efficient access remains a problem. This raises a key question: can AI help retrieve and interpret conservation evidence more effectively?

The Role of AI in Conservation

Using general chatbots for evidence-based conservation answers is unreliable. The way AI systems retrieve information must be carefully designed to avoid errors and misinformation. Large Language Models (LLMs) like GPT-4o and Claude 3.5 Sonnet excel at processing text and generating responses, but they can also produce inaccuracies or biased outputs if not properly managed.

A research team from Imperial College London and Cambridge University explored whether LLMs could accurately extract information from the Conservation Evidence database. Their study, published in PLOS One (May 2025), compared AI performance against that of human conservation experts.

Testing AI Models on Conservation Evidence Retrieval

The team simulated an exam environment. They generated thousands of multiple-choice questions based on Conservation Evidence database entries, each tied to specific database pages summarizing the impacts of conservation actions. Ten LLMs were tested on their ability to locate and interpret the correct information.

Human experts from the Conservation Evidence team answered a smaller subset of questions to establish a performance benchmark. Various AI setups were evaluated:

  • Closed Book: AI relied solely on pre-existing knowledge without database access.
  • Open Book (Oracle): AI had direct access to the relevant database page.
  • Open Book (Confused): AI received both relevant and unrelated database texts.
  • Open Book (Retrieval): AI searched the database using sparse (keyword-based), dense (semantic-based), and hybrid retrieval methods.

The hybrid approach combined sparse and dense methods, selecting the most relevant pages from each.

Results: AI Matches or Exceeds Human-Level Accuracy

AI models configured with hybrid retrieval showed the best results. GPT-4o, Llama 3.1 70B, and Gemma 2 27B achieved mean accuracy between 95.6% and 97.8%, slightly outperforming the human experts' average of 94.8%, though the difference was not statistically significant. This indicates AI can match expert-level accuracy in retrieving and interpreting conservation evidence.

Hybrid retrieval also led in retrieval accuracy (88.9%) compared to sparse (71.1%) and dense (80.0%) methods, and came close to human experts' retrieval accuracy (87.8%). AI performance dropped significantly without access to the database (Closed Book), with accuracy falling to 62.6%-69.8%.

One model, Llama 3.1 8B Instruct Turbo, performed below expert level, achieving 86.7% accuracy. Additionally, AI delivered answers almost instantly, while human experts averaged over two minutes per question, highlighting a major efficiency advantage.

Implications for Conservation Decision-Making

The study highlights that AI tools must be carefully configured and connected to relevant databases to provide reliable, evidence-based answers. Off-the-shelf LLMs, without access to up-to-date, domain-specific data, risk inaccuracies that could misinform conservation decisions.

Properly set up AI systems can help conservation practitioners quickly access and summarise scientific evidence, potentially speeding up and improving the quality of decision-making. Future research should address more complex conservation questions requiring nuanced reasoning, and consider ethical issues such as fairness, environmental impact, and maintaining the role of human judgement.

Acknowledgements

This research was conducted by Radhika Iyer, Sam Reynolds, William Sutherland (University of Cambridge), Alec Christie (Centre for Environmental Policy, Imperial College London), Sadiq Jaffer, and Anil Madhavapeddy (University of Cambridge). Support came from the AI@Cam initiative, the UROP scheme, and donors including Tarides and John Bernstein.