AI Reviewing AI: 82% of Fabricated Papers Get Accepted

LLM reviewers green-light AI-made papers about 4 of 5 times, a new analysis finds. Without human sign-off and stronger checks, shaky work can seep into the literature.

Categorized in: AI News Science and Research
Published on: Nov 12, 2025
AI Reviewing AI: 82% of Fabricated Papers Get Accepted

AI peer reviewers are green-lighting AI-fabricated papers. Often.

New evidence suggests large language model (LLM) "reviewers" recommend acceptance for AI-generated manuscripts roughly 4 out of 5 times. The analysis, posted Oct. 20 as an arXiv preprint (not yet peer reviewed), shows how easily automated review loops can normalize unsound work.

Researchers generated 600 fake manuscripts using GPT-5, then asked three other OpenAI models-o3, o4-mini, and GPT-4.1-to review them. Despite flagging some integrity issues, the AI reviewers still recommended acceptance up to 82% of the time.

"AI can be misused to attack this vulnerable system," says study author Fengqing Jiang of the University of Washington, who has not released the underlying BadScientist code to avoid misuse. While the test set focused on computer science, the same setup could be tuned to produce manuscripts in other fields.

The team submitted this work to AI Agents for Science, a conference where submissions were written and reviewed exclusively by AI. It's a live experiment in whether AI can generate hypotheses, methods, and results at acceptable quality-and where the guardrails fail.

Why this matters for researchers, editors, and lab leads

AI-only loops are now plausible: AI generates a study, AI reviews it, and the cycle repeats. That risks a flood of plausible-sounding but unsound papers that pass basic checks and pollute the literature.

Once bad citations and synthetic results enter the reference chain, they're hard to unwind. You don't need intent to deceive for this to be a problem-speed and scale are enough.

What the study signals

  • Surface-level critique isn't enough: Models can spot issues yet still over-recommend acceptance.
  • Incentives favor passivity: AI reviewers don't bear reputational cost, so they default to approval.
  • Field portability: If it works for CS, it can be adapted for chemistry, biology, and beyond.

Practical safeguards you can implement now

  • Human-in-the-loop by policy: Require named human sign-off for every accept/reject. AI can assist, not decide.
  • Integrity scoring: Add a rubric that weights red flags: unverifiable citations, missing data/code, statistical impossibilities, method-result mismatches.
  • Provenance statements: Mandate disclosure of AI use in writing, analysis, and visualization. Ask for prompts, model versions, and generation dates when feasible.
  • Identity and origin checks: Verify author identities (e.g., ORCID), require data/code deposits, and spot-check for synthetic references and duplicate text across submissions.
  • Hybrid review workflow: Allow LLMs for summaries and checklists, but not as the reviewer of record. Any AI-produced critique must be audited by humans.
  • Repro-lite: For empirical work, run a minimal reproduction: load data, execute core analysis, and compare key numbers to the manuscript.
  • Reviewer training: Teach editors and referees how to detect AI-generated text, images, and "too-clean" narratives-and how to use AI as a skeptical assistant, not a rubber stamp.

Community sentiment

A 2025 survey of more than 5,200 scientists by Nature reports that over 90% find it acceptable to use generative AI to edit or translate their own work. But 60% say using generative AI to conduct peer review isn't acceptable; 57% are fine with AI assisting reviewers by answering questions about a paper.

What to watch next

  • Policy updates: Expect clearer rules from journals and funders on AI disclosure, reviewer conduct, and minimum reproducibility.
  • Verification tech: Growth in provenance tools, reference validation, and automated checks for statistical anomalies and data leakage.
  • Benchmarking: Shared test suites to evaluate AI reviewers on rigor, not just convenience.

Bottom line: AI can accelerate peer review, but it should raise the bar for scrutiny-not lower it. Put humans on the hook for decisions, make integrity measurable, and treat AI as a tool to stress-test claims, not wave them through.

If you need to upskill reviewers and authors on safe, high-impact AI use in research, see our curated programs: AI courses by job.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)