Stop Feeding the Machine: Let Academia Set the Rules for AI in Research

Scholars, not platforms, should set AI rules for open science. Demand reciprocity and curb scraping so openness boosts quality and equity without feeding closed models.

Categorized in: AI News Science and Research

Published on: Dec 20, 2025

Open science must avoid 'feeding the machine' for AI firms

Policy | 19 Dec 2025

Academic standards for AI use in research should come from scholarly communities, not top-down mandates. That was the message from Tony Ross-Hellauer, who argued that open science values can guide trustworthy, field-specific practices (AI for Science & Research).

The risk is clear: open outputs are being absorbed by large AI models with limited return to the communities who produced them. If open science ends up widening the gap-where big players get bigger-we have to pause and recalibrate.

Who sets the rules? Scholars, not platforms

Central principles are useful, but science is too diverse for one-size-fits-all rules. Standards should be built by labs, societies, and journals within each field, then shared, compared, and iterated.

That is how you get real adoption: norms that fit disciplinary methods, data types, and risk profiles.

The extraction problem

Many researchers are uneasy with how AI firms scrape papers, methods, and writing styles. The value flow is lopsided: what do they give back versus what they take?

If openness defaults to subsidizing closed systems, we undermine equity-the very thing open science is meant to protect.

Open science and trustworthy AI align

Trustworthy AI and open science share core values: transparency, quality, equity, inclusiveness, and collective benefit. See the EU's ethics framework for trustworthy AI and Unesco's recommendation on open science for reference.

The challenge is execution: making AI in science auditable, reproducible, and fair-without burying researchers in admin.

Field-ready standards you can adopt now

Disclosure and provenance. Require an "AI use" subsection in methods and acknowledgements: model name, version, source, date of use, key settings, and what it was used for (editing, code, analysis, figures). Archive prompts, inputs, and generated artifacts as supplementary files when feasible.
Reproducibility by default. Prefer open-source tools and models where possible so peers can rerun work. Log seeds, checkpoints, and dataset versions. Cite models and datasets like you cite software and instruments.
Bias and error safeguards. Add domain-specific evaluation before results inform claims: bias checks, calibration tests, error analysis, and red-teaming for known failure modes in your field.
Compute and sustainability. Report training/inference compute, energy, and carbon estimates for AI-heavy pipelines. Encourage efficiency baselines and review resource intensity in grants and IRBs.
Licensing and access. Choose licenses and repository options that reflect your intent. Where appropriate, use machine-readable reservations for commercial text-and-data mining and clarify terms for derivative model training. Coordinate with your library and repository teams.
Data protection. For sensitive data, apply data minimization, de-identification, and clear retention policies. Avoid sending proprietary or restricted data to external services without a DPA and documented safeguards.
Vendor contracts. Negotiate: no training on your data by default, deletion on request, auditability, export, and provenance features. Prefer on-prem or managed options that pass institutional security review.
Editorial and funding policy. Journals and funders should mandate AI transparency, require availability of prompts/outputs where appropriate, and review claims supported by AI with extra scrutiny.

A workable lab playbook

Define approved tools and use cases (editing, summarization, code scaffolding), plus banned ones (novel results without verification, sensitive data in unmanaged tools).
Create an AI disclosure checklist and a simple model registry (who used what, when, for which task, with which settings).
Maintain a shared prompt library with good examples and known failure cases; update it after each project.
Set "risk gates" in your workflow: independent review before AI-assisted methods, figures, or code make it into manuscripts or data releases.
Pilot, measure time saved vs. error introduced, and iterate quarterly.

Limit unwanted scraping today

Audit where your outputs live (lab sites, repositories, journals). Where possible, use robots and meta directives to curb AI training crawlers and adopt machine-readable opt-outs offered by repositories.
Use clear licenses and TDM reservations where allowed by funders and journals; state expectations for reuse and derivative model training in README files.
Prefer API access with fair-use terms over open bulk endpoints when releasing large corpora; log access to support community oversight.
Label AI-generated artifacts (figures, text, code) and consider content provenance tools so downstream users can assess trust.

Why this matters

Open science only works if it improves quality, equity, and public benefit. If our practices end up subsidizing opaque systems, we lose trust-and the point of being open.

The fix is not to close up. It's to set community standards, build open tools, and demand reciprocity. That's how we keep openness working for researchers and the public.

If your team needs structured upskilling on responsible AI use in research workflows, explore curated options by role here: AI courses by job, or follow the AI Learning Path for Data Scientists for data-focused practitioners.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Stop Feeding the Machine: Let Academia Set the Rules for AI in Research

Open science must avoid 'feeding the machine' for AI firms

Who sets the rules? Scholars, not platforms

The extraction problem

Open science and trustworthy AI align

Field-ready standards you can adopt now

A workable lab playbook

Limit unwanted scraping today

Why this matters

Related AI News for Science and Research

U of T and AMD launch AI and computing research hub with cybersecurity in focus

UK launches £40m AI lab to tackle hallucinations and build trust

China puts AI at the heart of science, with AGI in its sights

AI outpaces PhDs in research - and academia scrambles to keep up

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: