Authors sue Salesforce over alleged XGen training on pirated books
On Oct. 20, 2025, a group of writers filed a putative class action in federal court in San Francisco alleging that Salesforce's large language model, XGen, was trained on hundreds of thousands of copyrighted books obtained without permission.
The case is led by attorney Joseph Saveri. The complaint claims that mass book copying is the hidden fuel behind commercial AI performance and seeks relief for authors whose works were allegedly used without consent.
What's alleged
- Salesforce's XGen was trained on datasets containing pirated books, according to the complaint.
- The filing frames this as part of a broader pattern in the AI sector: using unlicensed books to build and improve models.
- The suit is a proposed class action, which means more authors could join if the class is certified.
These are allegations, not proven facts. Salesforce's position will be reflected in forthcoming court filings.
Why this matters for working writers
Books are high-signal training material. If unlicensed ingestion is accepted, the market value of your backlist and future rights can erode. If courts reject it, licensing revenue and consent standards may strengthen.
The outcome could shape how AI companies acquire text data and how writers are paid-or bypassed-when their work is used to train models.
Key questions this case could answer
- Is large-scale ingestion of books for training protected by fair use, or does it require permission and payment?
- Can rights holders identify and prove their works were used in training when datasets are opaque?
- What remedies apply-injunctions, damages, statutory penalties, or licensing frameworks?
Practical steps for authors right now
- Audit your catalog: Keep clean records of copyrights, editions, and publication dates. Register works that aren't registered.
- Monitor datasets and repositories: Track whether known book datasets reference your titles. Save screenshots and links as evidence.
- Review your contracts: Check publisher agreements for data-mining and AI clauses. Clarify who controls licensing for training use.
- Use policy levers: Consider DMCA takedowns where your works are shared without authorization. Document everything.
- Join collective advocacy: Coordinate with professional groups to amplify negotiations and legal strategy.
Resources
- U.S. Copyright Office: Artificial Intelligence
- Authors Guild: AI Advocacy and Resources
- Practical AI tools for copywriting (Complete AI Training)
What happens next
Expect a motion to dismiss and briefing on fair use, consent, and damages. Discovery could surface how XGen's training data was sourced. Settlement is possible if licensing terms emerge, but a court ruling would have wider impact.
For writers, this is a signal to protect rights, keep records tight, and stay informed. The business model for AI and books is being tested in court.
Your membership also unlocks: