At Agents4Science, AI scientists stumble-skilled at analysis, short on judgment

At Agents4Science, AI agents led studies and even reviewed papers-fast coding, messy judgment, and bogus citations. Use them to move quicker, but keep humans steering and verifying.

AI "scientists" still need human judgment: lessons from Agents4Science 2025

At a one-of-a-kind conference, AI systems were listed as first authors and even reviewers. The goal: see what happens when agents lead research, end to end. The result: useful technical output, but shaky scientific judgment and frequent citation failures.

Agents4Science accepted 47 papers from 300+ submissions. According to co-organiser James Zou (Stanford), the event was built because most journals won't allow AI as co-authors, making it hard to be transparent about how researchers use these systems.

What actually happened in the studies

ChatGPT and Claude ran a two-sided job marketplace project, from ideation to experiments. They drifted off-topic, forgot to update supporting documents, hallucinated references, and produced redundant code and prose until human collaborators intervened.
Google's Gemini analyzed San Francisco's 2020 policy cutting towing fees for low-income drivers. It handled data processing, but repeatedly fabricated sources, researchers from UC Berkeley reported.

How human experts judged the work

Risa Wechsler, a computational astrophysicist at Stanford, said the submissions showed decent technical chops but weak judgment. Some analyses were fine on paper yet uninteresting, or framed questions in ways that didn't make sense, sometimes using methods far too complex for the problem.

James Evans, a computational sociologist at the University of Chicago, warned about the confident tone of current AI systems. When an agent sounds neutral and certain, people tend to stop questioning-bad news for a process that depends on disagreement and argument to move forward.

Barbara Cheifet, editor at Nature Biotechnology, stressed that hallucinated references are still a major issue. Her stance: treat AI as a colleague, not an author, because humans are responsible for accuracy, originality, and integrity.

What this means for researchers and writers

AI can accelerate parts of research and writing. But without firm constraints, it drifts, fabricates, and overcomplicates. If you use agents in your work, keep your hands on the wheel.

Keep humans in charge of the question. Let AI explore, but you decide what's interesting, important, and worth testing.
Reduce method bloat. Start with the simplest baseline that can answer the question. Only add complexity when it clearly beats that baseline.
Structure the workflow. Break the project into checkpoints: problem framing, literature scan, data plan, analysis, interpretation, write-up. Require a short approval at each step.
Force argument, not agreement. Ask the model to critique its own plan, propose alternatives, and list failure modes. If you use multi-agent setups, assign opposing roles.
Citation hygiene is non-negotiable. Ban auto-citations. Require DOIs or verifiable URLs, and check every reference. See policies from Nature on AI authorship and COPE.
Log everything. Save prompts, versions, seeds, and outputs. Treat agent runs like experiments with a lab notebook.
Guardrails for context. Maintain a single "source of truth" document the agent must read before generating. Require explicit updates to supporting materials.
Zero tolerance for fabrication. Instruct the model to say "I don't have a source" instead of guessing. Ask for page numbers and quotes for any claim tied to a citation.
Separate analysis from narration. Have one pass generate code/analysis and another write the explanation. Then swap: critique each with the other pass.
Reproducibility first. Package data, code, environment, and a plain-English README so another researcher can rerun the work without you.

Where AI stands right now

From this conference, the pattern is clear: AI is a capable assistant that speeds up analysis and drafting, but it still falls short on choosing meaningful questions, keeping context straight, and citing correctly. That aligns with Wechsler's view that, over the next decade, AI will sit somewhere between "best intern" and "favorite collaborator."

Use agents to move faster. Rely on humans to decide what matters and to verify what's true.

Further practice

If you want structured practice on prompts and agent workflows for research and writing, see our prompt course resources.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

At Agents4Science, AI scientists stumble-skilled at analysis, short on judgment

AI "scientists" still need human judgment: lessons from Agents4Science 2025

What actually happened in the studies

How human experts judged the work

What this means for researchers and writers

Where AI stands right now

Further practice

Related AI News for Writers

At Agents4Science, AI scientists stumble-skilled at analysis, short on judgment

OpenAI eases ChatGPT rules, letting verified adults generate erotica

AI beats writer's block, but big-name authors still rule, says Bloomsbury chief

I tried the world's first AI facial - it aged me four years but left me glowing

Related AI News for Science and Research

AI Falls Short on Multi-Step Scientific Reasoning, MaCBench Puts It to the Test

Brain-inspired wiring makes AI faster and greener

At Agents4Science, AI scientists stumble-skilled at analysis, short on judgment

Why AI Keeps Making Things Up-and Should It Just Say I Don't Know?

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: