Stop Feeding the Machine: Let Academia Set the Rules for AI in Research

Scholars, not platforms, should set AI rules for open science. Demand reciprocity and curb scraping so openness boosts quality and equity without feeding closed models.

Categorized in: AI News Science and Research
Published on: Dec 20, 2025
Stop Feeding the Machine: Let Academia Set the Rules for AI in Research

Open science must avoid 'feeding the machine' for AI firms

Policy | 19 Dec 2025

Academic standards for AI use in research should come from scholarly communities, not top-down mandates. That was the message from Tony Ross-Hellauer, who argued that open science values can guide trustworthy, field-specific practices.

The risk is clear: open outputs are being absorbed by large AI models with limited return to the communities who produced them. If open science ends up widening the gap-where big players get bigger-we have to pause and recalibrate.

Who sets the rules? Scholars, not platforms

Central principles are useful, but science is too diverse for one-size-fits-all rules. Standards should be built by labs, societies, and journals within each field, then shared, compared, and iterated.

That is how you get real adoption: norms that fit disciplinary methods, data types, and risk profiles.

The extraction problem

Many researchers are uneasy with how AI firms scrape papers, methods, and writing styles. The value flow is lopsided: what do they give back versus what they take?

If openness defaults to subsidizing closed systems, we undermine equity-the very thing open science is meant to protect.

Open science and trustworthy AI align

Trustworthy AI and open science share core values: transparency, quality, equity, inclusiveness, and collective benefit. See the EU's ethics framework for trustworthy AI and Unesco's recommendation on open science for reference.

The challenge is execution: making AI in science auditable, reproducible, and fair-without burying researchers in admin.

Field-ready standards you can adopt now

  • Disclosure and provenance. Require an "AI use" subsection in methods and acknowledgements: model name, version, source, date of use, key settings, and what it was used for (editing, code, analysis, figures). Archive prompts, inputs, and generated artifacts as supplementary files when feasible.
  • Reproducibility by default. Prefer open-source tools and models where possible so peers can rerun work. Log seeds, checkpoints, and dataset versions. Cite models and datasets like you cite software and instruments.
  • Bias and error safeguards. Add domain-specific evaluation before results inform claims: bias checks, calibration tests, error analysis, and red-teaming for known failure modes in your field.
  • Compute and sustainability. Report training/inference compute, energy, and carbon estimates for AI-heavy pipelines. Encourage efficiency baselines and review resource intensity in grants and IRBs.
  • Licensing and access. Choose licenses and repository options that reflect your intent. Where appropriate, use machine-readable reservations for commercial text-and-data mining and clarify terms for derivative model training. Coordinate with your library and repository teams.
  • Data protection. For sensitive data, apply data minimization, de-identification, and clear retention policies. Avoid sending proprietary or restricted data to external services without a DPA and documented safeguards.
  • Vendor contracts. Negotiate: no training on your data by default, deletion on request, auditability, export, and provenance features. Prefer on-prem or managed options that pass institutional security review.
  • Editorial and funding policy. Journals and funders should mandate AI transparency, require availability of prompts/outputs where appropriate, and review claims supported by AI with extra scrutiny.

A workable lab playbook

  • Define approved tools and use cases (editing, summarization, code scaffolding), plus banned ones (novel results without verification, sensitive data in unmanaged tools).
  • Create an AI disclosure checklist and a simple model registry (who used what, when, for which task, with which settings).
  • Maintain a shared prompt library with good examples and known failure cases; update it after each project.
  • Set "risk gates" in your workflow: independent review before AI-assisted methods, figures, or code make it into manuscripts or data releases.
  • Pilot, measure time saved vs. error introduced, and iterate quarterly.

Limit unwanted scraping today

  • Audit where your outputs live (lab sites, repositories, journals). Where possible, use robots and meta directives to curb AI training crawlers and adopt machine-readable opt-outs offered by repositories.
  • Use clear licenses and TDM reservations where allowed by funders and journals; state expectations for reuse and derivative model training in README files.
  • Prefer API access with fair-use terms over open bulk endpoints when releasing large corpora; log access to support community oversight.
  • Label AI-generated artifacts (figures, text, code) and consider content provenance tools so downstream users can assess trust.

Why this matters

Open science only works if it improves quality, equity, and public benefit. If our practices end up subsidizing opaque systems, we lose trust-and the point of being open.

The fix is not to close up. It's to set community standards, build open tools, and demand reciprocity. That's how we keep openness working for researchers and the public.

If your team needs structured upskilling on responsible AI use in research workflows, explore curated options by role here: AI courses by job.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide