Peer Review Is Drowning in AI-Generated Papers, and Editors Are Scrambling to Keep Up
Submissions to Organization Science, a leading management research journal, jumped 42% after ChatGPT launched in late 2022. The papers arriving weren't better-they were harder to read, stuffed with jargon, and less likely to pass review.
A new analysis from the journal's AI Task Force examined 6,957 initial submissions and 10,389 peer reviews between January 2021 and February 2026. The findings show a clear pattern: heavier AI use tracks with weaker prose, higher rejection rates, and mounting pressure on unpaid reviewers keeping the system running.
The problem extends beyond awkward writing. The research reveals how generative tools are colliding with academic incentive structures built to reward paper output over quality.
The Writing Quality Drop Is Measurable
Using Pangram, an AI detection tool, the journal scored submissions on a scale from zero to one. Rather than labeling individual papers as human or machine-written, the team tracked shifts across thousands of texts.
After ChatGPT's release, the pattern was sharp. Submissions with little or no AI writing declined. AI-assisted and heavily AI-generated submissions climbed. By February 2026, most papers showed at least some AI involvement. The fastest-growing segment was the most machine-heavy: manuscripts scoring above 70% on the AI scale.
Readability moved in the opposite direction. The Flesch Reading Ease score-a standard measure of how easily readers can understand text-fell sharply after late 2022. By January 2026, submission writing quality sat 1.28 standard deviations below January 2021 levels.
Higher AI scores correlated with lower readability (rho = -0.4, p ≤ .001). AI-heavy writing also demanded higher reading grade levels, used more jargon, and relied more on nominalizations-abstract noun forms that turn simple actions into bureaucratic language.
The prose did show some improvements: less hedging, less passive voice, more specificity. But the overall effect was denser, harder to read.
Institutional Incentives Are Driving the Surge
The editors argue that AI alone doesn't explain the flood. The real driver is generative tools amplifying incentives already embedded in academic culture: the pressure to produce more papers.
In business schools, the UT-Dallas journal ranking list-which tracks faculty publications across 24 designated journals-represents one of the strongest symbols of that pressure. The research team tested whether schools that historically responded most strongly to this ranking system also changed behavior most after ChatGPT arrived.
They did. Schools classified as stronger "UTD Responders" increased submissions after ChatGPT, with growth concentrated in papers scoring above 15% on the AI scale. The pattern held even after excluding schools in Mainland China and Hong Kong.
This matters because it shows heavy AI use is not random. It tracks with institutional reward systems.
"AI, as it's being used today, is colliding with institutional incentives to create more rather than better research," said Claudine Gartenberg, a senior editor on the team and a Wharton professor. "It's not AI on its own. It's AI plus publish-or-perish incentives."
The data also show that heavy AI use doesn't help authors. Papers with more AI writing were rejected more often at the desk stage and after review. The break point came around 30% AI use. Before ChatGPT, 11.9% of papers in the 0-15% AI category received a revise-and-resubmit decision. For papers scoring 70% or above, that figure dropped to 3.2%.
Peer Review Itself Is Getting Harder to Read
The problem isn't limited to submissions. More than 30% of text-entered peer reviews now show detectable AI use. Before ChatGPT, that figure was near zero.
Like manuscripts, these reviews became harder to read as AI scores rose. They contained more jargon, more nominalization, and lower readability scores.
The content shifted too. AI-heavy reviews emphasized theory more and data less. In the journal's regression analysis, AI score was positively associated with theory emphasis and negatively associated with data emphasis.
Most striking: AI-heavy reviews didn't correlate with editorial decisions. Human reviews did. AI-written reviews weren't helping editors make better choices.
"It's not like the editors know that those are AI reviews and they're throwing them out," Gartenberg said. "They're reading them and they're not informing the editor's ultimate recommendation."
This leaves editors doing more work themselves, protecting journal standards but straining a system built on volunteer labor. Organization Science expanded from six deputy editors to eleven and from about 30 senior editors to about 60. Some deputy editors now handle more than 250 manuscripts annually.
The Gate Is Still Holding-For Now
Published articles remain overwhelmingly human-written based on detectable signals in abstracts. Heavily AI-generated manuscripts rarely make it to print. The editors are catching most weak work before publication.
But catching it requires people, and the workload is growing.
The report doesn't argue that AI has no place in science. The authors used AI themselves while preparing the analysis-for coding, outlining, phrasing, and comparing their work. Even with that assistance, their editorial scored 8.8% on Pangram and stayed within their human-first range.
"People think as they write," Gartenberg said, "and so if you don't write, you're not thinking as deeply about it."
Where the Technology Might Actually Help
The real bottleneck in publishing isn't producing papers-it's evaluating them. Journals struggle to find reviewers. Editors drown in submissions. In that context, AI might be more useful as a screening tool than as a ghostwriter.
The authors suggest several possibilities. A journal could use AI to flag unreadable prose, high jargon density, or weak alignment between claims and methods before editors invest time. It could guide reviewers toward neglected questions about data and evidence. It could serve as scaffolding, not as a substitute for expertise.
The team stops short of calling for automated gatekeeping. It warns that disclosure rules and outright bans won't solve the deeper issue: institutions that reward paper counts and journal placements more than sustained intellectual contribution.
What This Means for Your Work
The findings point to immediate practical needs for journals: better triage systems, better reviewer support, and policies that reduce the burden of low-quality high-volume submissions before they consume scarce editorial attention.
For universities, the research raises a harder question about whether publication counts and journal-list incentives are now actively encouraging lower-value output.
For researchers, the message is direct: using AI to save time may backfire if it replaces the thinking that strong writing reflects. The journal's data show that heavy AI writing doesn't improve a paper's chances. It may damage them.
If AI is going to strengthen research rather than swamp it, the tool needs to be aimed at better work, not just more of it.
The full analysis is available in Organization Science.
Your membership also unlocks: