Anthropic Destroyed Millions of Books to Feed Its AI

Anthropic Destroyed Millions of Books to Feed Its AI—And the Courts Approved

Anthropic trained its AI by buying physical books, tearing out pages, scanning them, and discarding the originals. A judge ruled this legal under the first-sale doctrine, sparking ethical debates.

Anthropic’s Controversial Method of Training AI with Physical Books

Anthropic, the AI startup backed by Google, recently revealed a striking approach to gathering training data for its Claude AI model. Instead of relying solely on digital copies, the company purchased millions of physical books, physically tore out their pages, scanned them, and then discarded the original texts. This isn’t just a metaphorical “devouring” of books—it’s literal.

This practice came to light through a recent copyright ruling that favored Anthropic and, more broadly, the tech industry’s appetite for data. US District Judge William Alsup ruled that Anthropic can train its large language models using books it legally purchased, even without explicit permission from authors.

How Anthropic’s Approach Works Legally

The key to Anthropic’s strategy lies in the first-sale doctrine. This legal principle allows the buyer of a physical book to use it as they please without the copyright holder’s interference, which is why secondhand book sales are legal. Anthropic leveraged this to bypass the need for licensing.

By stripping pages out of the books, Anthropic made scanning cheaper and simpler. Since the company only used the scanned content internally and then discarded the physical pages, the judge considered this a form of “space conservation,” which contributed to the ruling that this process is legally acceptable.

Ethical and Practical Issues

Despite the legal win, the method raises ethical questions. Destroying millions of physical books for data extraction can be seen as wasteful and disrespectful to authors and publishers. The practice also highlights a broader problem: AI companies are searching aggressively for high-quality data sources, sometimes at the expense of the original content creators.

Anthropic’s approach isn’t unique. Others, including Meta, have used pirated books for AI training, leading to ongoing lawsuits from authors. Meanwhile, many archivists and organizations like the Internet Archive and Google Books have developed techniques to digitize books without destruction, showing that alternatives do exist.

What This Means for AI Development

AI companies are pushing legal boundaries to acquire training data.
Legal loopholes like the first-sale doctrine can be exploited in unexpected ways.
There’s tension between data acquisition methods and ethical responsibility.
Destructive scanning methods highlight a shortsighted approach to sourcing data.

As AI models continue to grow in scale and complexity, the demand for diverse and high-quality training data will only increase. This case serves as a reminder that the industry’s hunger for data sometimes leads to questionable practices that may not be sustainable or respectful of content creators.

For those working in IT and development interested in AI training practices and ethical data handling, understanding these legal and operational challenges is crucial. To explore more about AI models and training methodologies, you can visit Complete AI Training for courses and resources.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Anthropic Destroyed Millions of Books to Feed Its AI—And the Courts Approved

Anthropic’s Controversial Method of Training AI with Physical Books

How Anthropic’s Approach Works Legally

Ethical and Practical Issues

What This Means for AI Development

Related AI News for IT and Development

Blender limits Anthropic to one-time donation and reaffirms no generative AI plans

Connecticut passes comprehensive AI regulation bill, governor set to sign

Pentagon signs AI deals with Google, Microsoft, Amazon, Nvidia, OpenAI, Reflection and SpaceX for classified military systems

CISA, NSA and partners release guidance on adopting agentic AI systems

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: