In a nutshell
- Adobe faces a proposed class-action lawsuit alleging it trained AI on pirated books.
- The case targets Adobe's SlimLM, reportedly trained on the SlimPajama dataset.
- The dispute spotlights the legal heat around how tech companies source training data.
- Outcomes here could set the tone for future AI development and IP rules.
Why creatives should care
If your work lives online, it's likely been scraped, summarized, or sampled. This lawsuit isn't just about a model name-it's about consent, credit, and compensation for creative labor. The result could influence how your art, copy, photos, and scripts are used by AI-legally and financially.
It also affects your tools. If datasets are found to be off-limits, features you rely on could change, slow down, or cost more. Plan for that.
The lawsuit at a glance
Adobe has been building SlimLM, a small language model aimed at document assistance, especially on mobile. A proposed class action claims SlimLM was trained on SlimPajama-627B, a dataset said to derive from RedPajama, which includes the disputed Books3 collection of around 191,000 books.
Author Elizabeth Lyon alleges her works were included without consent. Similar claims have circulated across the industry, with multiple companies named in related suits. The shared thread: training on copyrighted books without permission.
What's really at issue
This isn't an anti-AI moment-it's a consent moment. Creators are asking for a say in how their work is ingested, attributed, and monetized. Companies want broad data to improve models. Courts are being asked to clarify where fair use ends and where licensing begins.
Expect three outcomes in some mix: clearer licensing norms for books and other media, stronger transparency about training data, and content credentials that track provenance.
Practical steps to protect your work now
- Add provenance: Use Content Credentials (C2PA) where possible to bind authorship and edit history to your files. It's not a silver bullet, but it helps with attribution and trust.
- Set AI-use preferences: Add "noai/noimageai" metadata and robots rules on sites you control. Many crawlers respect these signals.
- Register your key works: Formal registration strengthens your position if you ever need to enforce rights.
- Watch your uploads: Avoid feeding entire manuscripts, high-res art, or client-sensitive work into public models unless your contract allows it.
- Ask vendors hard questions: What datasets were used? Are commercial rights covered? Is there an indemnity? Get answers in writing.
- Watermark where it matters: Visible or invisible marks can deter misuse and help prove origin.
If you build client work with AI
- Update contracts: Disclose AI use, define ownership, and include warranties that fit your tools and risk tolerance.
- Prefer enterprise plans: Use offerings with training opt-outs, audit logs, and IP indemnification.
- Keep a paper trail: Save prompts, versions, and sources. If something's challenged, you'll want receipts.
- Avoid style cloning of living artists without consent: It's a reputational and legal risk.
- License references: If your deliverable leans on a specific text or visual source, get a license or swap in cleared material.
What could happen next
Courts may push the industry toward licensed datasets, verifiable provenance, and clear creator controls. That could mean better attribution and new revenue streams for rights holders. It could also mean some AI features become more expensive or slower to ship.
For creatives, the practical move is to assume change is coming and prep your workflow: provenance in, permissions tracked, vendors vetted. You'll be ready no matter how the gavel falls.
Useful resources
Level up your AI workflow (without stepping on IP)
If you want structured training that respects rights and reduces risk, browse role-based tracks here: Complete AI Training: Courses by Job. Build a setup that's fast, compliant, and client-ready.
Your membership also unlocks: