My Books Flopped. A Chatbot Trained on Them. Where's My Cut?
Unlicensed AI training on your books demands payment-backed by billion-dollar settlements. Price access, risk, and substitution, with minimums, upside, logs, and limits.

What Should I Get Paid When a Chatbot Eats My Books?
Your books are assets. If an A.I. company ingests them to train a model, that's a commercial use - and commercial use requires payment. With high-profile settlements hitting the headlines, the real question is no longer "if," but "how much."
Here's a clear framework to value your catalog, protect it, and negotiate from strength.
The news that matters for writers
On Sept. 5, a major A.I. company reportedly settled a class-action case with authors for $1.5 billion after being accused of training on pirated ebooks sourced from Library Genesis in 2021 and Pirate Library Mirror in 2022. The signal is simple: Unlicensed ingestion has a price. That sets precedent for compensation models and tighter guardrails on training data.
How to think about "fair pay" for training on your books
Pricing should reflect three things: the value of your input, the risk and rights you're giving up, and the economic impact on your sales. A flat fee misses the long-term value your work creates inside a model. A usage-only royalty ignores your up-front risk.
A practical approach blends a minimum guarantee with variable upside. Treat your catalog like a dataset license, not a one-off reprint.
A simple payout framework you can use
- Access fee (one-time): A base payment to ingest and process your works. Anchor per title or per 10,000 words.
- Training fee (per word or per title): Higher rates for in-print, recent, or bestselling works; lower for backlist or out-of-print.
- Popularity multiplier: Weighted by verified sales, citations, or catalog size (e.g., 1.0 for standard titles, 1.5-3.0 for strong sellers).
- Substitution premium: Extra pay if the model can summarize, imitate, or answer questions that compete with your book's core value.
- Transparency and audit: Payment tied to dataset logs, title-level inclusion, and explicit model versions. No logs, no license.
- Term and scope: Limited duration (e.g., 2-3 years), specific model families, no sublicensing without consent, and deletion/unlearning on termination.
Think in packages: minimum guarantee per title or per 100k words, plus a revenue share on model subscriptions or API usage attributed to your dataset cohort. If they can't attribute usage, push for higher minimums and strict terms.
Negotiation checklist if an A.I. company approaches you
- Scope: What titles, editions, translations, and formats are included?
- Use: Training only, or also fine-tuning, evals, safety tuning, and retrieval?
- Outputs: Prevent verbatim reproduction, derivative style cloning, or book-length summaries that substitute for purchase.
- Attribution: Require dataset credit and machine-readable source metadata.
- Reporting: Title-level inclusion report, model/version mapping, and periodic updates.
- Unlearning: Clear deletion process, with confirmation, if rights revert or the deal ends.
- Money: Minimum guarantee plus variable component; MFN with peers; late-payment penalties.
- Safety and security: No redistributing your files; access control; breach notification.
Protect your catalog now
- Register copyright: File your titles and editions; keep proofs of authorship and publication dates. See the U.S. Copyright Office's A.I. resource page here.
- Clean metadata: Centralize ISBNs, editions, word counts, TOCs, and publication status for fast claims.
- Monitor piracy: Set alerts for PDFs/EPUBs; send takedowns to pirate mirrors and file hosts.
- Control web crawling: Use robots directives on your site to restrict known A.I. crawlers; add clear terms of use.
- Evidence locker: Keep dated copies of your files, contracts, sales reports, and distribution records.
- Collective action: Track class actions and join coordinated efforts via trusted organizations like the Authors Guild.
How to estimate a number before you negotiate
Start with a floor per title that would make you whole if sales dip due to model substitution. Then add a training premium based on recency and popularity. For multi-book catalogs, bundle a minimum guarantee for the set plus escalators tied to usage or future model launches.
If they refuse reporting, raise the floor and shorten the term. Lack of transparency is a cost - price it in.
What to do if your work was trained without permission
- Document evidence: title lists, dates, and any model outputs that quote or closely imitate your text.
- Preserve samples and logs; record prompts and responses with timestamps.
- Contact a qualified attorney or your guild; review ongoing claims and deadlines.
- Request dataset inclusion disclosures and deletion where available.
Turn A.I. into leverage, not a threat
Use A.I. tools to scale your legitimate outputs: study guides, lesson plans, newsletters, and short-form ideas that lead back to your books. Build formats A.I. can't easily replace: author commentary, workshops, community Q&A, and serialized updates tied to your owned audience.
If you want structured learning and tools curated for writers, explore practical resources here: AI courses by job and a vetted list of AI tools for copywriting.
Bottom line
Your books have enduring value as training data. Price for access, risk, and substitution. Ask for logs, limit scope and term, and secure a meaningful minimum - then share upside when their models benefit from your work.
"Exposure" isn't payment. Clear contracts and enforceable terms are.