AI research is drowning in slop. Here's what to do about it
One person says he authored 113 AI papers this year. Eighty-nine are slated to be presented this week at a leading AI conference. The author, Kevin Zhu, recently finished a CS bachelor's, runs a mentoring company called Algoverse, and lists high school students as co-authors on many papers.
Plenty of researchers see this as a symptom, not an outlier. Hany Farid, a Berkeley professor, called the work a "disaster" and "vibe coding." Others report seeing similar publication frenzies. The shared concern: volume is swamping quality, and the review pipeline can't keep up.
What's fueling the flood
Submission counts are exploding. NeurIPS saw 21,575 submissions this year, up from under 10,000 in 2020. ICLR reported a 70% jump year over year for its 2026 cycle. Reviewers say average scores are dropping, and some submissions read like they were written by AI.
The review process is thin. Conferences lean on fast cycles, limited revision, and armies of student reviewers. That setup rewards quantity, clever packaging, and hype. It penalizes slow, careful work.
Career pressure adds gasoline. Students and early-career researchers are pushed to stack lines on a CV. Mentoring outfits (including Algoverse, which charges thousands for a 12-week program and help with submissions) pitch publications as a ticket to opportunities.
The case: Zhu and Algoverse
Zhu says the 100+ papers are team projects he supervised through Algoverse. He says he reviews methodology and drafts, and that domain projects involve mentors with relevant expertise. The teams use standard productivity tools and sometimes language models for clarity or copy edits. Critics argue authorship on that many papers in a year makes meaningful contribution unlikely.
Why this matters
The signal-to-noise ratio is collapsing. Big tech and safety orgs push unreviewed work to arXiv. Conferences are overwhelmed. Journalists, the public, and even experts struggle to track what's real, what's incremental, and what's flawed. That's how bad ideas sneak into products, policy, and public discourse.
Practical fixes for researchers, reviewers, and leaders
- Raise the bar for authorship: Publish author contribution statements (CRediT-style). No honorary bylines. If your name is on it, you can defend the methods and results.
- Pre-commit experiments: Write a short plan before you touch code: datasets, splits, baseline list, metrics, acceptable compute budget, and stopping rules. Treat post-hoc fishing as a red flag.
- Lock evaluation: Use strict data hygiene. Document contamination checks. Keep a blind holdout. Report multiple seeds and variance. Avoid cherry-picked screenshots or best-of-N runs.
- Demand artifacts: Release code, configs, weights (if possible), and exact scripts to reproduce tables. Submit to artifact evaluation when offered. Use a checklist like the NeurIPS Reproducibility Checklist (PDF).
- Ablations or it didn't happen: Show what each component contributes. Remove parts and report the drop. If the method is mostly prompt tuning or data curation, say so plainly.
- Be explicit about AI assistance: If language models were used for writing or coding, disclose scope and guardrails. Hidden assistance erodes trust fast.
- Reviewer triage: Start with deal-breakers: missing code/data, no contamination checks, no baselines, no ablations, or shaky statistics. If any fail, stop.
- Calibrate incentives: Hiring and promotion should ask for your top 2-3 papers with detailed contributions and artifacts, not a publication spreadsheet. Reward replication and negative results.
- Team hygiene: Add a pre-submission review. Replicate key baselines internally. Require a one-page "threats to validity" note that calls out shortcuts and limitations.
How to spot slop fast
- Big claims with thin methods sections. Pretty figures, vague algorithms.
- No public code, or "code available upon request" with missing training scripts.
- Benchmarks chosen for convenience, not rigor. No discussion of data leakage.
- Single-seed results, no variance, no error bars.
- Ablations missing or superficial. Baselines oddly weak or misconfigured.
- Dozens of workshop papers, unclear author roles, inflated citation lists, or references with odd errors.
Perspective
Great work still breaks through. The transformer paper, Attention Is All You Need, hit NeurIPS in 2017 and changed the trajectory of the field (arXiv). But the current pace punishes thoughtfulness. The fix isn't a grand committee-it's individual groups choosing to do fewer things, better.
If you lead people, make it safe to publish less and validate more. If you're early in your career, resist the publication grind. Strong, clean work outlasts a pile of workshop abstracts.
Level up your evaluation skills
If your team needs a sharper playbook for reproducibility, evaluation, and reporting, explore curated AI training by job role (Complete AI Training). A few focused skills can keep your work out of the slop pile.
Your membership also unlocks: