AI-Generated Peer Reviews Easily Evade Detection and Threaten Scientific Integrity

Generative AI like Claude 2.0 can create convincing scientific peer reviews that fool detection tools and influence editorial decisions. This raises concerns about research integrity and calls for new safeguards.

Categorized in: AI News Science and Research

Published on: Aug 01, 2025

Generative Artificial Intelligence in Scientific Peer Review

Generative artificial intelligence (AI) is becoming increasingly difficult to detect, raising serious concerns for the integrity of scientific peer review.

AI Fools Detection Software in Scientific Publishing Test

Researchers from medical institutions in China tested Claude 2.0, an AI language model, by asking it to generate peer review reports for 20 cancer biology papers published in eLife. The AI produced full peer reviews, rejection recommendations, citation requests, and rebuttals to citation demands.

Popular AI detection tools struggled significantly. GPTzero, a widely trusted program, misclassified 82.8% of the AI-generated reviews as human-written. ZeroGPT performed only slightly better, misidentifying 59.7% as human-authored. Despite claims of high accuracy, these tools failed to recognize AI-generated academic writing.

The AI-generated reviews maintained professional academic tone and college-level readability, making them indistinguishable from genuine peer reviews both to detection software and human readers.

Beyond Detection: AI’s Concerning Capabilities

The more alarming issue is what the AI can do. When asked to reject research papers, over 76% of the AI’s rejection comments received above-average ratings from expert reviewers. This suggests that AI can produce convincing, credible-sounding critiques capable of influencing editorial decisions.

Additionally, the AI created plausible but irrelevant citation requests, some scoring a perfect five by experts. It even justified citing unrelated papers across different fields such as materials science and medical research. This raises concerns about potential misuse, such as artificially inflating citation metrics.

However, the AI struggled to match detailed human reviews. It performed better when compared with broad, superficial feedback but failed to replicate the depth and specificity of expert critiques.

What This Means for Scientific Research

Current safeguards are ill-equipped to handle these challenges. Malicious reviewers could exploit AI to generate convincing but biased or manipulative content, undermining the peer review process and possibly suppressing valid scientific work.

There is also a risk that reviewers might use AI to boost their own citation counts by inserting unjustified references to their papers. On the positive side, the AI showed promise in generating rebuttals to unreasonable citation requests, indicating it could also assist in detecting manipulation.

Academic journals must act quickly. The study recommends mandatory disclosure of AI use by reviewers, similar to disclosure policies for authors. Publishers need clear operational guidelines to handle suspected AI-generated reviews. Additionally, AI providers should consider implementing restrictions to prevent misuse.

Given the limited scope—20 cancer biology papers—further research across disciplines is necessary. Yet the core issue is evident: as AI advances and detection tools lag behind, the scientific community must develop new protections to maintain research integrity without hindering legitimate technological use.

Study Summary

Methodology: Claude 2.0 generated peer reviews, rejection recommendations, citation requests, and rebuttals for 20 cancer biology papers from eLife. Each task was repeated three times for consistency. Two expert oncology reviewers scored the AI content on a five-point scale. AI detection tools ZeroGPT and GPTzero were tested for their ability to identify AI-generated text.
Results: AI detection tools failed significantly, with GPTzero misclassifying 82.8% of AI reviews as human-written. The AI generated convincing rejection comments and plausible citation requests, even for unrelated research fields. However, the AI struggled with detailed, specific critiques compared to human reviewers.
Limitations: The study covered only 20 cancer biology papers and focused on Claude 2.0 and two detection tools. Manual expert scoring limited sample size, and results may not generalize across all scientific fields or AI systems.
Funding and Disclosures: No funding sources or conflicts of interest were declared.

For those interested in further exploring AI’s impact on research workflows and how to responsibly integrate AI tools, consider visiting Complete AI Training for curated AI education resources.

Reference: Zhu, L., Lai, Y., Xie, J., et al. “Evaluating the potential risks of employing large language models in peer review,” Clinical and Translational Discovery, June 27, 2025. DOI: 10.1002/ctd2.70067

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

AI-Generated Peer Reviews Easily Evade Detection and Threaten Scientific Integrity

Generative Artificial Intelligence in Scientific Peer Review

AI Fools Detection Software in Scientific Publishing Test

Beyond Detection: AI’s Concerning Capabilities

What This Means for Scientific Research

Study Summary

Related AI News for Science and Research

Khatchig Mouradian Joins $11M Schmidt Sciences Initiative Bringing AI to the Humanities

AI Outpaces Readiness in Labs: Put Strategy First, Pair HR With IT, and Pace the Change

GPT-5.2 sets a new bar for math and science, from benchmark highs to a solved open problem

UH-led AI maps the Sun's magnetic field in 3D for earlier solar storm warnings

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: