AI Detection Tools Fail in High-Stakes Academic Settings, Study Finds
Commercially available tools designed to catch AI-generated text in scientific papers are unreliable and should not be used to make career-affecting decisions, according to research from the University of Florida.
Patrick Traynor, a professor and interim chair of the Department of Computer & Information Science & Engineering, examined five popular AI text detectors and found false positive rates ranging from 0.05% to 68.6% and false negative rates between 0.3% and 99.6%. The findings appear in a paper presented this week at the 2026 IEEE Symposium on Security and Privacy.
"These are not reliable or robust tools to use to measure the problem," Traynor said. "People's careers are on the line here."
How the research was conducted
Traynor and co-authors Seth Layton, Bernardo B.P. Madeiros, and Kevin Butler used papers submitted to top-tier security conferences before ChatGPT's release-about 6,000 papers total. They had large language models generate AI versions of the same papers, then ran both sets through the five most popular commercial detectors.
Two detectors performed adequately initially. But when researchers made a simple change-asking the LLM to use more complex vocabulary-the detectors failed dramatically. The detectors were easily fooled by fancy words.
The problem with measuring the problem
Media reports claim AI-generated text is increasingly appearing in scientific literature. But Traynor's work raises a fundamental question: how do we actually know?
The detectors themselves rely on the same large language models that researchers might use to generate text. Without reliable detection tools, claims about the prevalence of AI-generated papers cannot be verified.
Nature reported in 2026 that "poor-quality or entirely fabricated research produced by large language models could overwhelm the ability of current quality-control systems to detect it." Traynor's study suggests this conclusion cannot be reliably reached with existing tools.
Stakes in academia
In research careers, publication records directly measure individual merit. An accusation of using AI-generated text-even an unfounded one-can damage a researcher's reputation and derail their career.
"We really can't use them to adjudicate these decisions," Traynor said of the detection tools.
AI's actual potential
The researchers are not opposed to AI in academic work. Large language models can help speed up science and surface new insights, Traynor said. The issue is treating them as infallible.
"It's not an oracle. It doesn't always know the answer," he said. "It's happy to give us answers, but whether or not those answers have value, we still need people to figure that out."
Co-author Layton said the findings should prompt skepticism about all AI-related claims in research. "We demand that such claims include substantial proof that they are correct," Traynor added.
Learn more about Generative AI and LLM Courses and AI Research Courses to understand how these systems work.
Your membership also unlocks: