Apple faces two new lawsuits over AI trained on pirated books

Apple faces fresh suits over training Apple Intelligence on pirated books from shadow libraries. The cases test fair use and raise risks for how teams source training data.

Published on: Oct 14, 2025
Apple faces two new lawsuits over AI trained on pirated books

Apple sued again over alleged use of copyrighted books to train Apple Intelligence

Apple is facing two new lawsuits claiming it trained Apple Intelligence models on pirated books from "shadow libraries." The cases put Apple alongside Meta, OpenAI, and others already under legal fire for similar practices.

At the center is fair use: tech companies say training on copyrighted text without permission is lawful. Plaintiffs argue wholesale ingestion of books crosses the line.

What's new

Neuroscientists Dr. Susana Martinez-Conde and Dr. Stephen Macknik allege Apple used Books3-a shadow library with more than 190,000 works-to train its OpenELM model. A separate suit from authors Grady Hendrix and Jennifer Roberson claims Applebot scraped material from shadow libraries for Apple Intelligence.

Books3 has appeared in other cases, including Kadrey vs. Meta and Bartz vs. Anthropic, where courts ruled for the AI companies. Even so, the broader copyright question is still unsettled as more cases move forward.

Anthropic recently agreed to settle a class action over use of 500,000 pirated works for $1.5 billion. The Hendrix/Roberson case against Apple also seeks class action status.

Why it matters for teams building with AI

If you build, integrate, or buy models, liability doesn't stop at the lab. Procurement, engineering, product, and legal all have a stake in how training data is sourced and documented.

  • Demand provenance: require a bill of materials for datasets used in training and fine-tuning, not just model cards.
  • Contract for risk: add warranties on lawful data sourcing, audit rights, takedown procedures, and indemnification for copyright claims.
  • Respect opt-outs: honor robots.txt, meta tags, and publisher opt-out feeds across scraping and retraining workflows.
  • Moderate inputs and outputs: filter training corpora and log prompts/outputs; add similarity checks to reduce verbatim reproduction of copyrighted text.
  • Segment use cases: prefer commercially licensed or internal corpora for features that risk text reproduction; use RAG with licensed sources instead of fine-tuning on dubious data.
  • Prepare for takedowns: maintain data deletion paths, retraining plans, and versioned release notes to show remediation.
  • Insure and escalate: review IP insurance coverage and establish a clear incident process for copyright complaints.

Where the law stands

Fair use is the core defense for training on copyrighted text. Courts have issued mixed signals across cases, and more rulings are coming. For policy context, see the U.S. Copyright Office's resources on AI and fair use:

Other Apple legal pressure

Apple has faced separate suits over marketing Apple Intelligence features that were delayed. X (formerly Twitter) also sued Apple over its partnership with OpenAI, which powers parts of Apple Intelligence.

What to watch next

  • Whether courts certify class actions against Apple.
  • Any discovery confirming Applebot's sources and OpenELM's training data.
  • New guidance from regulators or industry standards on dataset provenance and disclosure.
  • Shifts in provider contracts as buyers push for stronger IP warranties and audit rights.

If your team needs structured training on AI compliance, dataset sourcing, and product risk, explore practical tracks by role here: AI courses by job.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)