John Carreyrou and Six Writers Sue AI Giants for Training on Pirated Books

John Carreyrou and fellow authors are suing major AI firms, saying their models trained on pirated books from shadow libraries. They seek clean data, licensing, and real consent.

Categorized in: AI News Writers
Published on: Dec 24, 2025
John Carreyrou and Six Writers Sue AI Giants for Training on Pirated Books

John Carreyrou And 6 Writers Sue AI Companies Over Pirated Books

Six top AI developers - Anthropic, Google, OpenAI, Meta, xAI, and Perplexity - are being sued by a group of authors including John Carreyrou. The claim: their models were trained on pirated copies of books taken from shadow libraries like Library Genesis and Z-Library. No permission, no credit, no compensation.

For writers, this isn't abstract. If the suit succeeds, it could force model makers to disclose data sources, pay for licenses, and purge tainted datasets. If it fails, expect more "pay a little, copy a lot" offers.

Who Is Suing, and On What Specific Legal Grounds

The plaintiffs say full books were copied into training corpora without a license, then used to improve commercial models that now sit inside assistants, search, and social products. They argue two core harms: unauthorized reproduction of entire works and model memorization that can regurgitate recognizable passages.

The companies named span the stack. Anthropic and OpenAI run mainstream assistants. Google and Meta integrate models into search and social. xAI pushes frontier-scale systems. Perplexity operates a conversational search engine. The alleged common thread: using pirated book datasets to boost fluency and factual recall - then monetizing the outputs.

A New Front After a Disputed Settlement Offer

This filing is separate from another class action against Anthropic, where a judge distinguished between training legality and acquiring pirated text. A proposed $1.5 billion settlement in that case could pay around $3,000 per eligible author - a figure many see as out of touch with the value of long-form works.

The new suit aims for structural remedies, not just checks. Plaintiffs want the courts to require clean data pipelines, disclosures of provenance, the end of shadow-library sourcing, and paid licenses going forward. In plain terms: change the inputs, not just the PR.

Fair Use Questions and the Problem of Memorization

There's a live debate: Is consuming entire books for model training a transformational fair use, or a market substitute at industrial scale? Courts allowed snippet scanning for search and indexing in Authors Guild v. Google, but text-generating systems raise fresh risks - long passages and style can surface on demand.

Research backs the concern. Studies show large language models can output verbatim text from training data with simple prompts, and repetition during training raises the odds. See, for example, Carlini et al.'s work on extraction risks (arXiv). The now-infamous Books3 set, believed to contain hundreds of thousands of scraped titles, has been cited in model research and is a likely flashpoint.

Policy is catching up. The U.S. Copyright Office urged steps such as transparency requirements and licensing or compensation mechanisms for training on copyrighted works (Copyright Office: AI Initiative).

Licensing Is Spotty but Slowly Emerging Across AI

Some developers are cutting deals with news outlets and publishers. OpenAI has agreements with The Associated Press, Axel Springer, and the Financial Times. Google has made content deals tied to news products. Image-model teams have signed agreements to contain risk.

Books are a different story. Few comprehensive licenses exist for long-form works, leaving a messy patchwork. If courts treat training on pirated books as willful infringement, damages could run up to $150,000 per work - a direct incentive to clean up datasets or start paying.

What This Means for AI Companies and Creators

If courts require audits, provenance disclosures, or destruction of tainted sets, model timelines and costs will shift. Expect heavier compliance: opt-out registries, dataset "nutrition labels," and alignment with the transparency demands of the E.U. AI Act.

For writers, the case is about leverage. You want a say in whether your books train systems that compete for your readers - and fair terms if they do. For AI companies, the goal is predictable, high-quality text access without case-by-case legal landmines.

What Writers Can Do Now

  • Register your copyrights promptly. It strengthens remedies and speeds action if there's infringement.
  • Audit your book's exposure. Search shadow libraries for your titles. Check datasets discussed in the press. Set alerts for unusual PDFs.
  • Test for memorization. Prompt major models for specific passages from your work. Save outputs, timestamps, and prompts. This record matters.
  • Send takedowns. If you find pirated copies or verbatim regurgitation, document it and use DMCA notices or platform reporting tools.
  • Update your contracts. Specify whether "AI training rights" are granted, restricted, or licensed separately. Add reporting requirements and audit clauses.
  • State your preferences. Use available opt-out mechanisms (robots directives, platform-level controls) and include "noAI" metadata where supported.
  • Join collective efforts. Trade groups help with negotiation leverage, legal guidance, and monitoring.
  • Explore AI on your terms. Use tools that respect licensing, and build workflows that save time without giving away rights.

Resources for Skill-Building (Ethical, Practical)

What to Watch Next

Near term: motions to dismiss, discovery battles over datasets, and potential early rulings on memorization and provenance. Mid term: whether the court orders audits or data deletion, and if industry pivots to broader book licensing.

The bigger picture is simple. Writers want consent, credit, and payment. If the court pushes the market in that direction, the incentives to produce books get stronger - and so does the trust behind the tech.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide