UF researchers find commercial AI text detectors too unreliable for academic use

AI detection tools used in academic settings are unreliable, with false positive rates reaching 68.6%, a University of Florida study found. Researchers say the tools should not be used to make career-affecting decisions.

Categorized in: AI News Science and Research
Published on: May 20, 2026
UF researchers find commercial AI text detectors too unreliable for academic use

AI Detection Tools Fail in High-Stakes Academic Settings, Study Finds

Commercially available tools designed to catch AI-generated text in scientific papers are unreliable and should not be used to make career-affecting decisions, according to research from the University of Florida.

Patrick Traynor, a professor and interim chair of the Department of Computer & Information Science & Engineering, examined five popular AI text detectors and found false positive rates ranging from 0.05% to 68.6% and false negative rates between 0.3% and 99.6%. The findings appear in a paper presented this week at the 2026 IEEE Symposium on Security and Privacy.

"These are not reliable or robust tools to use to measure the problem," Traynor said. "People's careers are on the line here."

How the research was conducted

Traynor and co-authors Seth Layton, Bernardo B.P. Madeiros, and Kevin Butler used papers submitted to top-tier security conferences before ChatGPT's release-about 6,000 papers total. They had large language models generate AI versions of the same papers, then ran both sets through the five most popular commercial detectors.

Two detectors performed adequately initially. But when researchers made a simple change-asking the LLM to use more complex vocabulary-the detectors failed dramatically. The detectors were easily fooled by fancy words.

The problem with measuring the problem

Media reports claim AI-generated text is increasingly appearing in scientific literature. But Traynor's work raises a fundamental question: how do we actually know?

The detectors themselves rely on the same large language models that researchers might use to generate text. Without reliable detection tools, claims about the prevalence of AI-generated papers cannot be verified.

Nature reported in 2026 that "poor-quality or entirely fabricated research produced by large language models could overwhelm the ability of current quality-control systems to detect it." Traynor's study suggests this conclusion cannot be reliably reached with existing tools.

Stakes in academia

In research careers, publication records directly measure individual merit. An accusation of using AI-generated text-even an unfounded one-can damage a researcher's reputation and derail their career.

"We really can't use them to adjudicate these decisions," Traynor said of the detection tools.

AI's actual potential

The researchers are not opposed to AI in academic work. Large language models can help speed up science and surface new insights, Traynor said. The issue is treating them as infallible.

"It's not an oracle. It doesn't always know the answer," he said. "It's happy to give us answers, but whether or not those answers have value, we still need people to figure that out."

Co-author Layton said the findings should prompt skepticism about all AI-related claims in research. "We demand that such claims include substantial proof that they are correct," Traynor added.

Learn more about Generative AI and LLM Courses and AI Research Courses to understand how these systems work.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)