Flood of fakes: paper mills, AI forgeries, and science's trust crisis
Science's trust is eroding as paper mills and AI-boosted image fraud surge. Fight back with provenance, raw data, stronger image checks, AI disclosure, and faster review.

Drowning in a sea of fakery
Science runs on trust. Scale broke the old model of knowing every player, but the literature still felt dependable. That confidence is slipping. The volume and sophistication of fakery are rising faster than our defenses.
Paper mills: fraud at scale
Paper mills will sell you a manuscript with weak or fabricated data. Analyses suggest their output is doubling roughly every 18 months, and some editors act as brokers to push these papers through. That corrodes the record and punishes honest work.
For background and guidance on spotting and handling paper-mill submissions, see Nature's overview of paper mills and COPE's recommendations.
Image fraud 2.0: AI makes fakes look real
Image reuse, splicing, and figure surgery have been detected for years, often after publication. Now, AI can generate convincing western blots, pathology images, and even nanomaterial micrographs. Early fakes were clumsy; recent ones can fool experts. The weak link is still the figure, but it looks cleaner than ever.
What you can do this week
- Demand provenance by default. Require original instrument files, acquisition logs, microscope/camera metadata, and analysis scripts. Record hashes (SHA-256) of raw files at capture and include them in submissions.
- Standardize pre-submission image checks. Run duplication searches, contrast/levels audits, and panel integrity checks (e.g., with ImageJ/Fiji workflows). Make two people sign off: the lead author and a lab mate not on the figure.
- Force transparency on AI use. Authors must disclose if AI touched any part of the workflow (writing, statistics, image synthesis). Require model name/version, prompts, seeds, and a statement that no synthetic data appears in figures unless clearly labeled as such.
- Publish the raw data, not just the plot. Deposit unprocessed files and code in a trusted repository with a DOI. Link figure panels to exact file paths and commit hashes in the data availability statement.
- Sharpen peer review. Assign at least one image-literate reviewer. Ask for raw data for one key result at random. If authors can't produce it fast, pause the process.
- Make post-publication review easy and fast. Provide a visible route for raising image concerns and commit to first decisions within 30 days. Log outcomes publicly.
- Hold journals to a standard. Support delisting titles with lax screening. Favor journals that require raw data, run image forensics, and publish correction/retraction metrics.
- Fix incentives locally. Update promotion and hiring rubrics to value data sharing, replication attempts, preregistration, and software reuse over sheer paper counts.
- Train your team. Run quarterly workshops on image forensics, data provenance, and AI disclosure. If you need structured options, see curated AI training by role here.
A minimal integrity pipeline
- Capture: Save raw instrument output to write-once storage; auto-assign IDs and timestamps.
- Hash: Generate and store checksums on capture; include them in lab notebooks and submissions.
- Store: Sync to a versioned repository; separate raw, processed, and presentation layers.
- Review: Run automated image checks and a manual audit on a sample of figures.
- Publish: Link each figure to raw files, scripts, and parameters; label any synthetic content.
- Monitor: Track post-publication flags and respond with time-bound actions.
Signals and metrics that matter
- Percent of submissions with verifiable raw data and analysis scripts.
- Time to produce raw data on request during review (target: under 72 hours).
- Rate of image issues caught pre-publication vs post-publication.
- Correction/retraction latency and transparency of outcomes.
The hard part
"Publish or perish" fuels shortcuts. Changing incentives takes time, but you can reduce the attack surface now with provenance, checks, and fast feedback loops. Trust won't be granted; it will be earned, figure by figure, dataset by dataset.