AI-written, AI-reviewed: Agents4Science 2025 puts the scientific process on trial
Agents4Science 2025 claims to be the first open conference where AI systems are listed as primary authors and serve as reviewers. Led by researchers at Stanford University, it starts Thursday, October 23 at 2:45am AEDT. The stated aim is to create a "relatively safe sandbox" to test whether AI can generate novel insights and maintain quality through AI-driven review.
Note: This is an experimental venue. Outputs and "AI reviews" have not been through standard human peer review, and conclusions may shift under human scrutiny.
Why this matters for working scientists
- Quality assurance: If AI writes and reviews research, what standards and failure modes should we expect compared to human peer review?
- Bias and reliability: Current models reflect training data and struggle at the edge of knowledge-precisely where research lives.
- Workflows and tooling: Expect more "LLM-as-a-judge" systems and committees of models ("Language Model Council") to score work. You'll need guardrails.
- Integrity and authorship: Who is accountable for claims, errors, or misconduct if the "first author" is non-human?
- Equity: Automated reviewers may penalise unconventional writing and novel ideas, with downstream effects on minority groups and early-career researchers.
- Policy and law: Copyright, consent, and attribution rules are not aligned with AI-first authorship.
What experts say
Professor Karin Verspoor, RMIT University, on integrity and evaluation:
"How do we assure the quality and integrity of scientific research in a world where AI agents are both generating and vetting scientific outputs? ... Replacing human reviewers with automated ones seems hugely problematic ... current AI systems have known and inherently unavoidable biases ... and at the edge of our knowledge these models are unreliable."
She notes the growing use of "LLM as a judge," and even a "Language Model Council" to mimic group oversight. Her bottom line: rigorously test capabilities and limits before handing core scientific processes to machines.
Professor David Powers, computer and cognitive science researcher, on experimentation and data:
"Agents4Science 2025 is an interesting experiment ... Many authors are now routinely using AI to write or rewrite their papers ... Conversely, conferences and publishers are exploring how AI can be used to referee papers." He highlights challenges in detecting hallucinated research and notes an acceptance rate of about 16% (24 of 300+), comparable to selective venues, creating a dataset to analyse what AI-written and AI-reviewed research looks like in practice.
Dr Armin Chitizadeh, AI ethics, on review risks and transparency:
He welcomes open acknowledgment of AI involvement but flags a key risk: AI reviewers reinforce common patterns and may miss genuine novelty or penalise unconventional styles. His practical stance: let AI assist formatting and citation checks; keep human judgment for novelty and significance. The conference offers a testbed to measure review quality.
Professor Albert Zomaya, University of Sydney, on the bigger question:
Conferences like this force the community to confront how authorship, accountability, and creativity should be defined as AI capabilities increase. The debate is overdue.
Professor Hussein Abbass, UNSW-Canberra, on authorship boundaries:
"AI does not qualify for academic authorship." He points to four pillars-contribution, integrity, accountability, consent-and argues AI cannot meet them without humans. Even if AI automates parts of the scientific method, authorship remains a human responsibility.
Professor Daswin De Silva, La Trobe University, on purpose and practice:
He calls the concept poorly motivated as a standalone conference. Research and peer review are deeply human activities-removing people from authorship and review risks hollowing out the purpose of scholarly exchange. He recommends treating "AI-as-author-and-reviewer" as a controlled simulation inside a human-led venue.
Dr Raffaele Ciriello, University of Sydney, on the nature of science:
He warns that a fully automated pipeline confuses the appearance of validity with actual inquiry. "Large language models do not think, discover, or know in any meaningful sense ... Granting them authorship or reviewer status anthropomorphises what are essentially stochastic text-prediction machines." His stance is blunt: without human conversation and critique, there is no science-only computation.
Dr Jonathan Harlen, Southern Cross University, on copyright and ownership:
Under current US and Australian law, AI-generated works do not qualify for human authorship, even with complex prompts. He points to cases such as IceTV v Nine Network and Primary Health Care v Commissioner of Taxation. The UK takes a different route: Section 9(3) of the Copyright, Designs and Patents Act 1988 treats the author of a computer-generated work as the person who made the arrangements necessary for its creation. That model could let human contributors claim ownership of AI-assisted research outputs.
Practical next steps for your lab
- Document AI use: tools, versions, prompts, datasets, and human supervision. Include this in methods and acknowledgements.
- Keep humans in the loop for: research questions, novelty assessment, ethics, data provenance, and conclusions.
- Pilot "AI-as-reviewer" internally on your past papers to surface failure modes before using it on active work.
- Create a lab policy on AI credit, accountability, and error correction. Decide what merits authorship vs acknowledgement.
- Audit bias: run checks on whether AI feedback penalises certain topics, styles, or demographics.
- Clarify IP early: if you must list an AI as first author for a venue, understand how that affects ownership and reuse.
What to watch at Agents4Science 2025
- Review quality: Do AI reviews catch methodological errors, data leakage, and statistical missteps? How are disagreements resolved?
- Novelty vs pattern matching: Are "accepted" pieces incremental or genuinely new? What does the citation profile look like?
- Transparency: Are prompts, datasets, and toolchains disclosed well enough to reproduce results?
- Acceptance rate and criteria: With ~16% acceptance, what distinguished accepted work?
- Post-conference audits: Independent replications and human re-reviews will be telling.
Bottom line
Letting AI write and review papers is a bold stress test of the research pipeline. It will surface useful data-and sharp edges. For now, keep human judgment at the centre, use AI where it adds speed and coverage, and document every step so findings stand up when people kick the tires.
If your team is updating skills for AI-assisted research, you may find these resources useful: AI courses by job role.
Your membership also unlocks: