Authors Sue Apple, Claim Apple Intelligence Was Trained on Thousands of Pirated Books

Authors sue Apple, alleging its AI learned from a trove of pirated books. The case could set rules on training data, payments, and how machines mimic writers' voices.

Categorized in: AI News Writers
Published on: Nov 23, 2025
Authors Sue Apple, Claim Apple Intelligence Was Trained on Thousands of Pirated Books

News Scandal in Silicon Valley: Apple Faces Lawsuit Over Alleged Use of Thousands of Pirated Books to Train Apple Intelligence

Writers are suing Apple for allegedly training Apple Intelligence on a massive stash of pirated books. The case targets the heart of modern AI: where training data comes from, who gets paid, and what happens when tech and creative work collide.

For authors, the stakes are obvious. If your work trained systems that now mimic your voice or compete with your titles, you want answers - and compensation.

Background of Apple Intelligence

Apple Intelligence is Apple's suite of AI features across iPhone, iPad, and Mac. It summarizes messages, rewrites text, automates actions, and personalizes content.

Behind the scenes are large language models such as OpenELM and Apple's Foundation Models. These systems need huge volumes of text. The lawsuit claims Apple pulled from sources it shouldn't have.

How the Pirated Books Controversy Emerged

Researchers and digital rights groups flagged a dataset called Books3 - a giant library scraped from pirate sites. Multiple AI companies have been linked to it. The suit says Apple is among them.

  • Over 183,000 digital books
  • Works by major authors and publishers
  • Thousands of titles that still earn royalties

The Plaintiffs Behind the Lawsuit

Authors Grady Hendrix and Jennifer Roberson filed a class-action case on behalf of writers whose books appear in Books3. Their core claims:

  • Apple used their copyrighted books without permission
  • Their writing was fed into Apple Intelligence models
  • Apple concealed the sources of its training data
  • Apple kept a private internal library of pirated content
  • The models output text that competes with their work

They're seeking damages, restitution, and an injunction that would stop Apple from using unlicensed texts in its models.

What Is Books3 and Why It Matters

Books3 is part of The Pile, a dataset assembled by EleutherAI. It includes books scraped from piracy hubs such as Bibliotik. For writers, the key issues are simple:

  • No consent from authors
  • Copyrighted works from major houses included
  • Used for commercial AI training
  • No notice or payment to rights holders

With Apple Intelligence embedded across millions of devices, the complaint argues Apple benefits from IP it didn't license.

How AI Training Works - And Where Copyright Issues Arise

Language models are trained on vast text corpora to learn patterns, style, and context. Sources can include public domain books, open datasets, Wikipedia, partner licenses - and, in disputed cases, scraped or unlicensed material.

The legal fight centers on whether ingesting copyrighted works for training is infringement. Some argue it's fair use; others say it's equivalent to scanning books to build a product without permission.

Context on fair use: U.S. Copyright Office: Fair Use.

Key Allegations Against Apple

  • Use of pirated books: Apple allegedly used Books3 and similar datasets to train OpenELM and its Foundation Models.
  • Lack of transparency: Apple has not disclosed full training data sources.
  • Private training library: Plaintiffs claim Apple kept an internal archive of copyrighted books.
  • Market dilution: Outputs can imitate an author's style, reducing demand for original work.
  • Unfair commercial advantage: Apple Intelligence helps sell devices using unlicensed IP, according to the suit.

Apple's Position and Industry Context

Apple hasn't issued a detailed public response to this case. Historically, it has said its models use licensed, publicly available, and user-provided data, with more on-device processing and user controls than rivals. The complaint challenges those claims.

Industry-Wide Problem: AI Models and Copyright

Apple is not alone. OpenAI, Google, and Meta have all faced claims tied to training data sourced from books, news, and websites. The bigger picture: AI development outpaced clear licensing norms. Courts are being asked to draw the lines.

Potential Legal Consequences for Apple

  • Monetary damages
  • Removal of copyrighted books from training sets
  • Limits on how Apple Intelligence operates
  • Mandatory data licensing for future training
  • Public disclosure of training sources

Impact on Authors and the Creative Economy

Writers worry about lost income, style imitation, and weaker copyright protections. A 2024 Authors Guild survey reported strong concern among authors about AI's impact on livelihoods and voice imitation.

For more context on advocacy and author rights: Authors Guild.

Summary of Key Allegations vs. Apple's Expected Defences

  • Use of pirated books: Plaintiffs say Books3 and similar libraries were used. Apple may argue: Data was mixed at scale; individual works aren't identifiable.
  • Copyright infringement: Works used without permission. Apple may argue: Training is transformative and falls under fair use.
  • Market dilution: Outputs compete with real books. Apple may argue: Features are assistive, not substitutes for authors.
  • Lack of transparency: Sources concealed. Apple may argue: Disclosure is limited for proprietary reasons.
  • Commercial benefit: Unlicensed inputs boosted features. Apple may argue: Any benefit is indirect and not tied to specific titles.

Why This Lawsuit Could Set a New Precedent for AI Regulation

Apple is a dominant player in consumer tech and fiercely protective of IP. A ruling against the company could ripple across the industry.

  • Clearer rules on what data can train models
  • Pressure to license books and pay rights holders
  • Greater transparency requirements
  • New licensing markets for authors
  • Slower release cycles for AI features across Big Tech

What Happens Next in Court

The case, filed in the Northern District of California, seeks a jury trial, injunctions, damages, restitution, attorney fees, and class certification. If the class is certified, thousands of authors could join.

What Writers Can Do Right Now

  • Audit where your books appear online, including piracy sites.
  • Register your copyrights and keep records of editions and formats.
  • Join or follow author advocacy groups for legal updates.
  • Clarify licensing terms with your agent or publisher for AI-related uses.
  • Track and document AI outputs that appear to imitate your style.

Conclusion

This case asks a blunt question: if your work trained AI, do you deserve a check and a choice? As courts weigh that question, writers should push for contracts, licensing models, and product disclosures that respect creative labor.

FAQ

What is Apple being sued for?
Apple is accused of using thousands of pirated books to train Apple Intelligence models without permission or payment.

Who filed the lawsuit?
Authors Grady Hendrix and Jennifer Roberson filed a class action on behalf of writers whose works appear in the Books3 dataset.

What is Books3?
A dataset of more than 180,000 books scraped from piracy sources, reportedly used by several AI companies for model training.

Could Apple face serious penalties?
Yes. Potential outcomes include damages, training set removals, operational limits, and new licensing requirements.

Why does this matter to writers?
It could set national rules on data used to train AI and whether authors must be compensated when their works are ingested.

Resource for upskilling: If you're mapping your next steps with AI and authorship, explore role-based learning paths here: Complete AI Training - Courses by Job.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)