Technology Judges Struggle to Define AI's Book Piracy Impact
Can AI companies freely use copyrighted books to train their models? That question is at the heart of over 40 lawsuits filed since 2022, targeting AI firms for allegedly using millions of copyrighted works without consent or payment. These cases highlight a clash between authors' rights and AI development.
Two recent court rulings, involving lawsuits against Anthropic and Meta, offer partial answers. Both judges found that training large language models (LLMs) on authors’ books qualifies as “fair use” because the resulting AI is “transformative” — it produces a fundamentally different product that doesn’t directly compete with the original works. However, these decisions don't settle the broader legal question, as the judges diverged significantly on key points.
Fair Use in AI Training: What the Rulings Say
Authors fear that AI chatbots threaten their livelihood by summarizing or generating competing content. Media coverage hailed the rulings as major wins for AI companies. But the reality is more complex. The rulings apply narrowly to specific facts and don’t establish a blanket rule that AI training is always fair use.
One judge ruled that Anthropic’s use of “pirate libraries” — collections of over 7 million stolen books — was not fair use, despite approving the training itself. This means Anthropic could face trial and heavy damages for retaining infringing material. Similarly, although the Meta ruling found training fair use, Meta might face further legal action for allegedly distributing pirated books via BitTorrent during data collection.
Disagreement Over Market Harm and Competition
A key legal dispute centers on whether AI-generated works harm the market for original authors. In Anthropic’s case, Judge William Alsup argued copyright law doesn’t protect authors from competition, comparing AI training to teaching schoolchildren to write, which naturally leads to more creative works.
By contrast, Judge Vince Chhabria, ruling on the Meta case, called this analogy “inapt.” He emphasized AI’s unique ability to massively scale creative output and argued that using copyrighted books to build tools that generate billions of dollars while potentially damaging authors’ markets challenges fair use principles.
Chhabria also pointed out that harm may vary by author. Famous authors might be less affected, while emerging writers could struggle to gain visibility and sales amid AI-generated competition. Plaintiffs must demonstrate specific market harm to succeed, which the Meta plaintiffs failed to do in that case.
LLMs’ Outputs Pose Further Legal Challenges
Both judges focused their fair use analysis on the training inputs, not on outputs that might infringe copyrights. Yet recent research shows LLMs sometimes reproduce training text verbatim. Studies found that 8–15% of responses in normal conversation are copied directly from web sources, and some outputs are 100% reproduced from copyrighted works such as Harry Potter and 1984.
This raises questions about whether AI models essentially contain unauthorized copies of protected texts, complicating the fair use defense. One respected legal scholar, formerly on Meta’s defense team, acknowledged these findings challenge defendants’ claims and may require courts to reassess how training and output relate to copyright infringement.
What This Means for Writers and the Legal Landscape
Authors whose works can be fully reproduced by AI may file more lawsuits, but such cases demand costly and specialized research rarely accessible outside AI companies. Meanwhile, the tech industry has little incentive to support transparency or fund independent studies.
The recent rulings represent early steps in defining responsible AI development and copyright boundaries. Copyright law aims to encourage creation by rewarding authors, not merely to protect them from competition. AI models remix existing work without true creativity, often producing derivative or low-quality content that can crowd out human voices.
Balancing AI innovation with protecting writers’ livelihoods remains unresolved. Whether broad fair use for AI training leads to a culture with fewer original human works is a critical question courts must face as new cases unfold.
Further Reading
- Explore AI training courses to better understand how models like LLMs are developed.
- Learn about AI and copyright law to stay informed on evolving legal challenges.
Your membership also unlocks: