In a test by the Authors Guild, AI detectors from Pangram and Grammarly correctly identified every piece of human writing as human, while Sidekicker flagged every article as AI-generated. The findings underscore a risk that writers have long feared: a single false positive can cost an author their contract and reputation.
The Guild used ten articles published between 2020 and 2022, before generative AI went mainstream. Originality.ai also performed well, but Sidekicker delivered the worst results-every article was flagged as mostly AI-generated, with two scoring 100 percent. ZeroGPT was also unreliable, reporting sometimes high AI percentages for all the human-written texts.
The limits of detection tools
The Authors Guild, the oldest and largest professional organization for writers, warns that even the best-performing tools should never be the sole basis for any decision. These tools change constantly, and their accuracy can't be taken for granted.
Pangram CEO Max Spero said his detector is a black box with no way to explain why a text gets flagged as AI-generated. Language models do give themselves away through uniformity, though, especially in how they build arguments. Humans write with far more variety, Spero said.
Professionally written texts share many of the same statistical patterns as AI output, according to the Authors Guild, simply because language models were trained on exactly that kind of writing. False results can cost authors their contracts and their reputations, so publishers should disclose their methods and always give authors a chance to defend themselves.
This creates a troubling paradox. A writer who has spent decades honing clarity, economy, and precision is, by definition, writing in a way that overlaps with what AI has learned to produce. Detection tools cannot distinguish between a human writer who has mastered the craft and a machine that has learned to imitate it, because at the level these tools operate, there may be little difference to find.
The fact that Pangram and Originality reliably identify human-written texts as human doesn't necessarily mean they're equally good at catching AI-generated ones. The results mainly show that these tools are tuned to minimize false positives, avoiding cases where human text gets wrongly flagged as AI. Plenty of texts written by or with AI could still slip through undetected. The reliability shown in this test applies first and foremost to correctly recognizing human writing.
The cultural debate behind detection
Errors will keep happening, and that's why the usefulness of these detectors keeps getting questioned. This is especially true since AI can be a genuinely useful writing tool, and the broader debate often conflates using AI to write with using AI to think.
Detector advocates like Spero justify their business model by pointing to a social contract between writer and reader. "The writer invests time and effort to shape an idea; the reader invests time to engage with it. If AI drops the cost of writing to zero, bad incentives follow, and people flood the internet with worthless content that takes readers more time to consume than it took the author to produce," Spero said.
Whether a piece of writing gets its value from the typing, though, or from the topic selection, the idea, the perspective, the story, the research, the argument, and the judgment behind it, that's a different question entirely. So is whether AI text detection can actually do anything about the flood of worthless content.
Why this matters for writers
For writers, the test is a reminder that even detectors with perfect human-text recognition can still miss AI-generated content. The real threat isn't just false positives-it's the erosion of trust when publishers rely on flawed tools without transparency. Writers should push for clear disclosure policies and always have the opportunity to defend their work. Understanding the limitations of AI detection is part of a broader need to adapt to AI in the profession, a topic covered in resources like AI for Writers.
Your membership also unlocks: