Publishers and Authors Sue Meta Over Unauthorized Use of Books to Train Llama AI
Five major publishers and bestselling author Scott Turow filed a class action lawsuit against Meta and CEO Mark Zuckerberg on May 5, alleging the company scraped millions of copyrighted works from pirate sites to train its Llama language model without permission or payment. The suit, filed in New York federal court, marks the first copyright infringement case brought directly by book publishers against a tech company over AI development.
The plaintiffs-Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill-organized with the Association of American Publishers. They claim Meta deliberately downloaded and torrented unauthorized copies of literary works to build Llama's training dataset.
What the Lawsuit Alleges
The complaint describes Llama as an "infinite substitution machine" that competes directly with human-authored works. The company's model can generate verbatim copies of training material, produce near-verbatim paraphrases, create knockoffs of popular novels, and flood markets with AI-generated alternatives.
The plaintiffs cite evidence that Meta employees understood their use of shadow libraries was legally questionable and "attempted to conceal" the practice. Internal emails suggest the company knew it was accessing unauthorized sources.
The lawsuit seeks a court declaration that Meta violated copyright law, an injunction against future infringement, maximum monetary damages, an accounting of Llama's training materials and methods, and destruction of all infringing copies.
Mixed Precedent in AI Copyright Cases
Courts have delivered conflicting rulings on whether AI training on copyrighted material qualifies as fair use. In June 2025, Judge William Alsup found that Anthropic's use of unauthorized books to train Claude AI was fair use-but ruled that keeping millions of unauthorized downloads in a permanent research library was not. That finding led to a $1.5 billion settlement.
In the same month, Judge Vincent Chhabria found AI training on copyrighted works to be fair use in a separate case against Meta. However, Chhabria suggested there may be an issue with "market dilution"-the possibility that AI-generated works could disrupt the market for human-authored content.
The publishers' complaint leans heavily on Chhabria's market dilution theory, arguing that Llama's outputs are already displacing human-authored works. They point to AI-generated books flooding Amazon's Kindle store as evidence the risk is not theoretical.
Industry Response
Meta said it will "fight this lawsuit aggressively," noting that courts have found AI training on copyrighted material can be fair use.
Maria Pallante, president and CEO of the Association of American Publishers, said Meta "made calculated decisions to enrich itself with literary properties that it did not create and does not own, when instead it could have partnered with publishers and authors."
McGraw Hill CEO Philip Moyer said the company supports AI's role in education but believes "protecting the foundational intellectual property rights of human authors" is essential. Hachette CEO David Shelley called the alleged conduct "wholesale theft," while Macmillan CEO Jon Yaged said it was "unconscionable" for one of the world's most valuable companies to steal from creators.
Broader Legal Landscape
Over 100 copyright lawsuits related to AI development are now pending in U.S. courts. This suit comes as two of the publisher plaintiffs-Cengage and Hachette-await a ruling on their bid to intervene in a separate case against Google over its Gemini AI service.
The publishers' complaint is broader than most publishing-related AI suits, proposing a class that includes copyright owners of novels, poems, nonfiction works, scientific journals, and other literary content. Meta's Llama can generate travel guides, book summaries, study guides, and imitations of published works on demand.
For writers, understanding how AI systems are trained and the legal frameworks governing their use is increasingly critical. AI for Writers resources can help you navigate the technology's implications for your work, while Generative AI and LLM Courses provide deeper technical context on how these systems operate.
Your membership also unlocks: