Scientists Are Using AI More - And Trusting It Less
Scientists are skeptical by training. The latest preview from Wiley's 2025 report on research and AI shows that skepticism is growing as hands-on use rises.
In 2024, 51% of surveyed scientists were worried about AI "hallucinations." In 2025, that jumped to 64% - even as AI adoption among researchers rose from 45% to 62%. Security and privacy concerns climbed by 11 percentage points, and confidence that AI is surpassing human ability dropped from "over half of use cases" to less than a third.
Why deeper use is eroding trust
Hallucinations aren't edge cases; they are common failure modes. We've already seen bogus citations in legal filings, misleading clinical suggestions, and fabricated travel details make it into the real world. Higher-capacity models don't automatically fix this; some tests show hallucinations persisting - and in certain settings, getting worse.
There's also an incentive problem. Users prefer confident systems over cautious ones, even if the confidence is misplaced. If a model hedges, engagement drops. If it bluffs, engagement holds. That bias pushes vendors to optimize for fluency and speed, not verifiability.
What this means for your lab
AI is useful for ideation, code scaffolding, and literature triage - but it is not a source of truth. Treat outputs like unreviewed notes from a keen intern: helpful, fast, and error-prone.
- Force citations and provenance: Require sources, DOIs, and links in every answer. Reject unsourced claims.
- Use retrieval over recall: Pair models with your vetted corpora (RAG) and log the exact passages used.
- Add a second pass: Run a separate "critic" prompt to fact-check names, numbers, units, and references.
- Benchmark tasks, not vibes: Track precision/recall on your actual workflows (screening, summarization, coding) with gold sets.
- Keep humans in the loop: Assign review ownership. No AI-generated content should bypass a named reviewer.
- Protect data: Use enterprise or on-prem options. Disable training on your prompts and outputs by default.
- Version everything: Log model, temperature, system prompt, retrieval source, and time for each run to ensure reproducibility.
- Red-team the edge cases: Unit conversions, rare diseases, homonyms, negations, out-of-distribution data. That's where errors hide.
- Set stop conditions: Define "no answer" rules. A model that admits uncertainty is valuable - wire your process to accept it.
Practical prompts that reduce risk
- "Answer only from the provided sources." If missing, respond: "Insufficient evidence in sources."
- "List every assumption you made." Forces transparency you can audit.
- "Cite with DOI/PMID and quote the exact sentence." Enables instant verification.
- "If two sources conflict, show both and do not resolve." Prevents confident fiction.
Where this is headed
As researchers get closer to the machinery, the shine fades and the utility sharpens. Hype gives way to workflow design, measurement, and governance. That's progress.
If your team needs a structured way to implement safe, verifiable AI workflows in research settings, see our practical AI training by job function.
Bottom line: use AI for speed, but make your systems allergic to unverified claims. Curiosity plus rigor beats confidence every time.
Your membership also unlocks: