Adobe hit with class-action lawsuit over alleged use of pirated books to train AI
Adobe is facing a class-action lawsuit in the US from Oregon author Elizabeth Lyon, who claims the company trained its AI models on pirated books without permission. The case targets SlimLM, Adobe's small language models used for document assistance on mobile devices. Lyon argues the training included her books and the work of other authors, without consent, credit, or compensation.
Adobe denies the allegations, saying SlimLM was trained on SlimPajama-627B, an open-source dataset released by Cerebras in 2023. The lawsuit counters that SlimPajama is a derivative of RedPajama, which allegedly includes Books3-a collection of nearly 200,000 pirated books. The complaint also says Adobe "repeatedly downloaded, copied, and processed those works during the preprocessing and pretraining of the models." Similar datasets have surfaced in previous lawsuits involving tech companies, raising the stakes for how AI is trained.
Why this matters to writers
This case goes to the core of how your work gets used in AI systems. If the court finds that derivative datasets bundled copyrighted books, it could set a precedent for consent, credit, and compensation. It also shows that even small, on-device models may be built on large-scale scrapes. Translation: your catalog might be part of someone's training data unless rules are enforced.
What Lyon is seeking
Lyon says she's "committed to vigorously prosecuting this action on behalf of the other members of the class" and has the "financial resources to do so." The complaint seeks statutory and other damages, reimbursement of attorney fees, and a declaration of willful infringement.
What you can do now
- Register your copyrights for current and future works. It strengthens your position in any dispute.
- Add clear AI-training clauses to your publishing and licensing contracts. Spell out what is and isn't allowed.
- Audit your digital distribution. Limit full-text availability where it doesn't serve your goals, and watch for unauthorized mirrors.
- Review the terms of AI tools you use. Opt out of data collection or training where possible.
- Keep clean records: drafts, timestamps, registrations, and publication history. Documentation wins cases.
- Band together. Professional organizations and class actions can push for stronger standards and enforcement.
What happens next
Expect procedural moves, discovery, and expert testimony on data lineage. Possible outcomes range from damages to injunctions or data deletion orders. Even without a final ruling, the pressure may push vendors to prove dataset origins or change how they train.
For writers, the signal is clear: assert rights, tighten contracts, and keep an eye on where your words show up. The business model of AI is colliding with the business model of authorship, and the terms are being written now.
Sources named in the dispute
- SlimLM (Adobe's model) trained, according to Adobe, on SlimPajama-627B by Cerebras: dataset page
- RedPajama, cited in the complaint as part of the lineage: dataset page
Keep your edge
If you're experimenting with AI in your workflow, do it on your terms and stay informed about data use. Here's a practical roundup of AI tools for copywriting you can control and evaluate for fit.
Your membership also unlocks: