AI Floods Science With Papers, But Quality and Peer Review Are Buckling

LLMs turbocharge manuscript counts, thinning language barriers. Quality cues blur, and overwhelmed reviewers say slick prose can hide weaker science.

Categorized in: AI News Science and Research
Published on: Jan 28, 2026
AI Floods Science With Papers, But Quality and Peer Review Are Buckling

How AI is transforming research: More papers, less quality, and a strained review system

January 27, 2026 * Research

Featured Researchers

Mathijs De Vaan
Associate Professor, Management of Organizations

Toby E. Stuart
Professor, Management of Organizations

For many scientists, writing has always been the bottleneck. Large language models looked like the fix-clearer prose, faster drafts, fewer hurdles for non-native English speakers. Then a bigger question surfaced: Are we just producing more papers, or better science?

A new study from UC Berkeley Haas and Cornell University, published in Science, says the change cuts both ways. Output is up. Quality signals are blurred. And the review system is under strain.

"The use of AI by scientists is stressing the system. It's creating a giant bottleneck and making it really hard for evaluators to keep up," said De Vaan.

What the team analyzed

  • More than 2 million preprints (2018-June 2024) across arXiv, bioRxiv, and SSRN.
  • Detection algorithms to flag likely AI-assisted manuscripts.
  • A quantitative measure of writing complexity for each paper.
  • Whether those papers were later published in peer-reviewed journals.

Key finding: Productivity surges

Scientists who adopted LLMs published far more manuscripts than peers who did not. The lift exceeded 50% on bioRxiv and SSRN, and was over one-third on arXiv.

  • Researchers with Asian names at Asian institutions saw gains near 90% in biology and social sciences.
  • Researchers with Western names at English-speaking institutions saw 24%-46% increases.

Translation: English barriers are thinning. Talent with strong ideas but weaker English fluency can ship work faster-and in volume.

Search is better, too

LLM-powered search (e.g., Bing Chat) surfaces newer papers and relevant books more effectively than traditional tools, which tend to recycle older, heavily cited work. For anyone doing literature reviews, this means broader reach and fresher sources.

The reversal: Complexity ≠ quality

Historically, clearer and more sophisticated writing has correlated with stronger papers and higher citation counts. That proxy breaks under AI assistance.

Among AI-assisted manuscripts, higher complexity predicted a lower chance of eventual journal publication. "The robots now write more complex and sophisticated science than many human scientists," Stuart said. "But what our analysis shows is that scientific articles that were mostly automated are of substantially lower quality than human-written papers."

Why this matters

We're seeing a flood of polished but marginal work. Reviewers and editors are overwhelmed. Funders and policymakers risk backing weaker projects because surface-level signals now mislead.

If evaluation doesn't adapt, the incentives will drift toward volume over substance.

What researchers can do now

  • Use AI as an editor, not an author. Draft your argument, results, and methods first; let the model refine language after.
  • Signal real contribution up front. State the core claim, effect size, and empirical leverage in the first 150 words.
  • Raise your bar for evidence. Add robustness checks, preregistration where possible, and public code/data.
  • Keep an audit trail. Document where AI assisted (sections, prompts, versions). Transparency builds trust.
  • Upgrade your search workflow. Combine LLM search with database queries and citation chasing to avoid blind spots.
  • Sharpen prompts for literature and methods reviews. Precision in prompts yields better retrieval and synthesis. For practical prompts and workflows, see Prompt Engineering guides.

What editors and reviewers can do

  • Retire writing sophistication as a quality proxy. Weigh identification strategy, data provenance, and replicability more heavily.
  • Adopt automated triage. Use specialized reviewer agents to screen for basic thresholds (design clarity, data checks, plagiarism, statistical sanity) before human review.
  • Require disclosures. Ask authors to state where and how AI contributed to the manuscript.
  • Tighten desk-reject criteria. Prioritize novelty, causal credibility, and policy/scientific relevance.
  • Expand structured reviews. Short, criteria-based checklists reduce cognitive load and speed consistent decisions.

What funders and institutions can do

  • Shift incentives from paper count to verifiable impact: datasets, code, replication packages, and follow-on adoption.
  • Finance community tooling: reviewer agents, replication grants, shared benchmarks for AI-drafted content.
  • Pilot new evaluation rubrics that de-emphasize language polish and emphasize contribution quality and reproducibility.
  • Support training for research teams on responsible AI use and verification practices.

Bottom line

AI makes it easier to write and ship research. It also makes it easier to look better than the work deserves.

The fix isn't to slow down. It's to upgrade how we search, write, review, and fund-so volume doesn't drown signal.

Read the study

Scientific production in the era of large language models
Keigo Kusumegi, Xinyu Yang, Paul Ginsparg, Mathijs de Vaan, Toby E. Stuart, Yian Yin
Science, December 2025. See the journal for details: Science.

Useful starting points


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide