Publishers Sue Meta Over Unauthorized Use of Books to Train Llama AI
Five major publishers and bestselling author Scott Turow filed a class action lawsuit against Meta and CEO Mark Zuckerberg on May 5, accusing the company of copying millions of copyrighted works without permission to develop its Llama large language model. The suit marks the first copyright infringement case brought directly by book publishers against an AI company.
Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill-organized with help from the Association of American Publishers-allege Meta deliberately sourced books from illegal pirate sites to train Llama. The complaint, filed in New York federal court, seeks maximum statutory damages, an injunction blocking future infringement, and destruction of all infringing copies.
The Core Allegation
Meta scraped, torrented, and downloaded unauthorized copies from known pirate libraries, the complaint states. The company then used these works to build an AI system capable of generating outputs that compete directly with published books.
The publishers argue Meta's Llama functions as "an infinite substitution machine." The system can output verbatim copies of training material, paraphrase existing works, generate imitations of popular novels, and flood markets with AI-generated books that displace human-authored content.
The complaint points to evidence that Meta employees understood the legal risks. Emails cited in the lawsuit suggest staff members viewed their use of shadow libraries as "legally questionable" but proceeded anyway.
Market Displacement Already Occurring
The publishers claim the threat is not theoretical. AI-generated books have saturated Amazon's Kindle store in volumes that materially displace human-authored works, according to the complaint. Llama can generate travel guides, book summaries, study guides, and what the publishers describe as "passable" imitations of opening chapters from published novels.
This market dilation argument builds on a finding by federal judge Vincent Chhabria in a separate 2025 case against Meta. Though Chhabria ruled that AI training on copyrighted works can constitute fair use, he flagged "market dilation"-the disruption of human authors' marketplace-as a potential issue.
Mixed Precedent in AI Copyright Cases
The publishers' lawsuit arrives as courts issue conflicting rulings on AI copyright claims. In June 2025, judge William Alsup found that Anthropic's use of unauthorized books to train its Claude AI system qualified as fair use for training purposes. However, Alsup ruled that Anthropic's decision to retain millions of unauthorized downloads in a permanent research library was not fair use-a finding that led to a $1.5 billion settlement.
The same judge, Chhabria, rejected claims in the earlier Meta case that using works from pirate sites was "automatically" infringing. But he suggested the plaintiffs had a valid theory about market disruption.
The publishers' complaint leans heavily into that market dilation theory, arguing that Meta made a "calculated decision" to copy literary works and appropriate their value rather than license them.
What the Publishers Want
The lawsuit seeks class status for all copyright owners whose works were used to train Llama. This includes novelists, poets, nonfiction writers, and scientific journal publishers-a significantly broader class than previous publishing-related AI suits.
The court is being asked to declare Meta's practices violate copyright law, issue an injunction, award monetary damages, require Meta to disclose its training materials and methods, and order destruction of infringing copies.
Meta's Response and Industry Stakes
Meta said it will "fight this lawsuit aggressively," noting that courts have already found training on copyrighted material can qualify as fair use.
The publishers framed their position as supporting AI development while protecting creators' rights. "Meta's mass-scale infringement isn't public progress," said Maria Pallante, president and CEO of the Association of American Publishers. "AI will never be properly realized if tech companies prioritize pirate sites over scholarship and imagination."
McGraw Hill CEO Philip Moyer said the company believes AI has an important role in education but stressed the need to protect "foundational intellectual property rights of human authors." Hachette CEO David Shelley called Meta's conduct "wholesale theft," while Macmillan CEO Jon Yaged described it as "unconscionable."
The lawsuit is one of over 100 copyright cases now filed against AI developers in U.S. courts. Two of the publisher plaintiffs-Cengage and Hachette-are also seeking to intervene as class representatives in a separate copyright case against Google over its Gemini AI system.
For writers, the case highlights a central tension: whether AI companies can build profitable systems on unpaid use of creative work, or whether creators have a right to compensation when their published works train commercial AI models.
Your membership also unlocks: