Authors Sue Salesforce in Class Action Alleging XGen Trained on Pirated Books

Authors sue Salesforce, alleging XGen was trained on pirated books. The class action could set rules on fair use vs. licensing, affecting how writers get paid for training uses.

Categorized in: AI News Writers
Published on: Oct 19, 2025
Authors Sue Salesforce in Class Action Alleging XGen Trained on Pirated Books

Authors sue Salesforce over alleged XGen training on pirated books

On Oct. 20, 2025, a group of writers filed a putative class action in federal court in San Francisco alleging that Salesforce's large language model, XGen, was trained on hundreds of thousands of copyrighted books obtained without permission.

The case is led by attorney Joseph Saveri. The complaint claims that mass book copying is the hidden fuel behind commercial AI performance and seeks relief for authors whose works were allegedly used without consent.

What's alleged

  • Salesforce's XGen was trained on datasets containing pirated books, according to the complaint.
  • The filing frames this as part of a broader pattern in the AI sector: using unlicensed books to build and improve models.
  • The suit is a proposed class action, which means more authors could join if the class is certified.

These are allegations, not proven facts. Salesforce's position will be reflected in forthcoming court filings.

Why this matters for working writers

Books are high-signal training material. If unlicensed ingestion is accepted, the market value of your backlist and future rights can erode. If courts reject it, licensing revenue and consent standards may strengthen.

The outcome could shape how AI companies acquire text data and how writers are paid-or bypassed-when their work is used to train models.

Key questions this case could answer

  • Is large-scale ingestion of books for training protected by fair use, or does it require permission and payment?
  • Can rights holders identify and prove their works were used in training when datasets are opaque?
  • What remedies apply-injunctions, damages, statutory penalties, or licensing frameworks?

Practical steps for authors right now

  • Audit your catalog: Keep clean records of copyrights, editions, and publication dates. Register works that aren't registered.
  • Monitor datasets and repositories: Track whether known book datasets reference your titles. Save screenshots and links as evidence.
  • Review your contracts: Check publisher agreements for data-mining and AI clauses. Clarify who controls licensing for training use.
  • Use policy levers: Consider DMCA takedowns where your works are shared without authorization. Document everything.
  • Join collective advocacy: Coordinate with professional groups to amplify negotiations and legal strategy.

Resources

What happens next

Expect a motion to dismiss and briefing on fair use, consent, and damages. Discovery could surface how XGen's training data was sourced. Settlement is possible if licensing terms emerge, but a court ruling would have wider impact.

For writers, this is a signal to protect rights, keep records tight, and stay informed. The business model for AI and books is being tested in court.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)