Writers Sue AI Giants Over Pirated Books Used to Train Models

Writers sued Anthropic, Google, OpenAI, Meta, xAI, and Perplexity, claiming their models leaned on pirated books. They want payouts, dataset disclosure, and a better deal than $3k.

Categorized in: AI News Writers
Published on: Dec 24, 2025
Writers Sue AI Giants Over Pirated Books Used to Train Models

Writers Sue AI Companies Over Pirated Book Training Data

Tuesday, 23 December 2025, 21:34

A new lawsuit filed by a group of writers targets Anthropic, Google, OpenAI, Meta, xAI, and Perplexity, alleging their models were trained on pirated copies of books. The filing includes prominent authors, such as Theranos whistleblower and "Bad Blood" author John Carreyrou. This action mirrors earlier suits focused on unauthorized book use in AI training.

As previously reported by tech media, the dispute turns on a key tension: a prior court noted that some training uses may be lawful, while book piracy itself is illegal. The writers argue that using stolen copies to train systems that generate massive profits crosses an obvious line. They want accountability, transparency, and compensation that reflects the value of their work.

The complaint also challenges the proposed Anthropic settlement, which has reportedly offered some authors around $3,000 out of a $1.5 billion total. Many view that as too little for systemic misuse and the ongoing economic impact on authors' livelihoods. The suit states the deal "seems to serve AI companies, not creators."

"LLM companies should not be able to easily settle thousands of costly claims at discounted prices, hiding the true cost of their large-scale intentional violations," the filing says. It also references potential criminal elements under specific Ukrainian statutes-claims that will likely be scrutinized for relevance and jurisdiction as the case proceeds.

Why this matters for writers

  • Your books are data-and datasets are money. If models are trained on pirated copies, that devalues legitimate licensing and your future earnings.
  • Settlements set precedent. Low payouts can normalize discounted liabilities and weaken bargaining power for everyone else.
  • Disclosure is leverage. If discovery forces companies to reveal training sources, it strengthens collective claims and shapes fair licensing standards.

What you can do now

  • Register your copyrights for every book, edition, and significant revision. It strengthens enforcement and damages.
  • Update your contracts. Add explicit clauses that prohibit AI training use without paid licenses, including clear definitions of "training," "fine-tuning," and "embedding."
  • Monitor piracy. Track unauthorized PDFs and EPUBs across major file-sharing sites and forums. Document URLs, timestamps, and copies for evidence.
  • Join collective actions or guild efforts. Individual cases are costly; collective claims increase pressure and reduce legal spend.
  • Use clear licensing terms on your site and storefronts. Include "no AI training" language and machine-readable signals where possible.
  • Keep a paper trail. Save drafts, publication dates, ISBNs, and sales records. Provenance matters when calculating damages.
  • Consider controlled syndication. If you license excerpts, do it with narrow scope, time limits, and explicit bans on AI training.

What to watch in this case

  • Settlement structure and opt-outs: Will authors be allowed to opt out and pursue individual claims with higher upside?
  • Discovery on datasets: Will the court compel disclosure of sources, scraping pipelines, and alleged reliance on pirated repositories?
  • Damages model: Statutory vs. actual damages, and whether ongoing model outputs amplify harm.
  • Future licensing frameworks: Could we see standardized rates for training, fine-tuning, and downstream commercial uses?
  • Regulatory pressure: Antitrust and data-access scrutiny may influence how AI firms acquire content and compensate rights holders.

Context and links

  • Background reporting on authors' lawsuits against AI training practices can be found at TechCrunch.
  • The European Commission has opened an antitrust investigation into Google's use of content for AI model training, focusing on access and fairness: European Commission press corner.

Related updates

  • Anthropic agreed to pay $1.5 billion to settle a class-action lawsuit over unauthorized use of authors' books to train Claude. It's a milestone deal that could influence future payouts and terms.
  • Adobe faces a class action alleging the use of pirated books, including works by Elizabeth Lyon, to train its SlimLM model. Expect more scrutiny on dataset provenance across the industry.
  • The European Commission's antitrust probe into Google's AI training practices zeroes in on data access and potential unfair conduct-an angle that could reshape how content is sourced.

A practical way forward for authors

  • Push for transparency and paid licenses-individually and through professional organizations.
  • Audit your rights stack, tighten agreements, and treat your backlist like an active portfolio.
  • Stay informed on AI's impact and your options. Practical training resources by job can help you adapt without giving away your rights: Complete AI Training by job.

Bottom line: treat your catalog like the asset it is. Guard it, price it, and don't accept terms that make stolen copies the default training set for billion-dollar models.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide