Big Tech vs. Writers: How Our Books Fueled AI-and What To Do Next
I got a call: be the representative plaintiff in class-action cases against tech giants for using copyrighted books-mine included-to train AI without permission. I said yes. That decision wasn't about hating technology. It was about consent, compensation and control.
If you write for a living, your work likely helped build systems now competing with your voice. This is about protecting your catalog, your income and your identity on the page.
How your writing ended up in training data
AI companies scraped the public internet-news sites, forums, social feeds, Wikipedia, and more. That still wasn't enough. To learn clean language and long-form structure, they needed books.
That's where shadow libraries and bundles came in. One dataset, Books3, pulled nearly 200,000 pirated titles from a tracker called Bibliotik. It was later folded into a mega-collection known as The Pile, used by researchers and companies to train large language models.
These models weren't just taught grammar. They were trained on our tone, structure and decision-making on the sentence level. The goal wasn't to copy a paragraph-it was to reproduce a voice on demand.
What the lawsuits allege
Two core claims are on the table. First: companies knowingly used pirated copies of books to train models. Second: they tried to conceal it-by stripping copyright information and designing systems to avoid admitting what data they consumed.
In Canada, making unauthorized copies of protected work for commercial use is prohibited. Training a model requires making multiple copies. The legal question is simple: did they obtain permission? The practical answer, for most writers, is no.
The counterarguments-and why they miss the point
You'll hear: "This is like teaching a gifted child to write." It isn't. A human learns ideas and values; a model learns patterns and probabilities. One cares. The other simulates caring.
You'll also hear: "We don't regurgitate books verbatim." That's not the problem. The problem is imitation at the level of style. Call it the doppelautor effect: not a clone, but a replicant that sounds uncannily like you. That's new territory for copyright and livelihood.
This isn't just about money-but money matters
Writing income is already thin. Early settlements elsewhere hint at a few thousand dollars per title-better than nothing, but not a match for ongoing use. The harm isn't a single act. It continues as long as models built on our work generate revenue.
Fair compensation looks like recurring payment, not a one-off check. If human intelligence is the input, we should share in the output.
Why this affects every writer
AI usage is everywhere. In one U.S. survey, almost a third of adults reported interacting with it daily-and experts think the true number is much higher. For writers, that means more "good enough" content competing with your voice, your clients and your readers.
There are broader costs too. Energy demand from AI is surging, with warnings it could rival major countries by decade's end, which brings real environmental trade-offs. This isn't a toy-it's infrastructure that consumes culture, labor and electricity.
- Pew Research on everyday AI use
- International Energy Agency on data center and AI electricity demand
What to do this week
- Audit your catalog. Search your book titles with terms like "Books3," "Bibliotik," or "The Pile." Look for mirrors, metadata lists, or GitHub repos referencing the datasets.
- Lock down your contracts. Add clauses that prohibit AI training or require a separate license and fee. Cover ebooks, audiobooks, PDFs, and excerpts.
- Send takedowns. If you find pirated copies, file DMCA/notice-and-takedown requests with hosts and search engines. Shadow libraries often use "remove on proof" policies-use them.
- Block crawlers on your site. In robots.txt, disallow AI-related agents. Example directives you can discuss with your webmaster: "User-agent: GPTBot / Disallow: /", "User-agent: CCBot / Disallow: /", "User-agent: Google-Extended / Disallow: /".
- Label and watermark. Embed clear copyright notices in your ebooks and metadata. Track unique phrases that can help you identify reuse.
- Join collective action. Support writer organizations, class actions, and policy efforts pushing for consent, transparency and payment.
- Decide your license stance. If you're open to training use, set terms and rates. If you're not, publish an explicit policy on your site.
- Monitor for imitators. Prompt major chatbots to "write in the style of [Your Name]" and save results. Document close matches for your records.
Use AI on your terms (without losing your voice)
AI is a tool, not your co-author. Use it for admin, outlines, or fact-checking sprints-but do the actual writing yourself. Protect your voice the way musicians protect their tone.
Want a vetted overview of practical tools for writing workflows? Start here: AI tools for copywriting. Use what helps, skip what dilutes your style.
What needs to change
Permission first. Clear reporting of training data. Real licensing and recurring compensation linked to ongoing model use. An independent body that represents writers in AI negotiations.
This isn't a tech-vs.-Luddite story. It's about who gets to decide how human work is used, and who gets paid when that work becomes someone else's product.
A closing note for working writers
Keep writing. Keep receipts. Say yes to useful tools and no to silent extraction. The models were trained on us. If they keep earning, we should keep earning too.
Your membership also unlocks: