Courts Rule AI Training on Copyrighted Works Is Fair Use, Setting Key Precedent for Future Disputes

Courts ruled AI training on copyrighted texts qualifies as fair use, highlighting its transformative nature and limited verbatim output. However, responsible data sourcing remains essential to avoid infringement.

Categorized in: AI News Legal
Published on: Jul 04, 2025
Courts Rule AI Training on Copyrighted Works Is Fair Use, Setting Key Precedent for Future Disputes

Courts Agree: AI Training Ruled As Fair Use in Bartz v. Anthropic and Kadrey v. Meta

Last week, the Northern District of California delivered two key rulings on fair use concerning artificial intelligence. Judge William Alsup’s decision in Bartz v. Anthropic and Judge Vince Chhabria’s ruling in Kadrey v. Meta mark the first substantial judicial answers to copyright questions raised by training large language models (LLMs) on copyrighted texts.

Both courts concluded that AI training on copyrighted works qualifies as protected fair use. However, the legal and technological nuances remain complex. These rulings provide important guidance amid ongoing litigation and policy debates.

AI Training is Highly Transformative

Both judges agreed that training LLMs involves a transformative use of copyrighted materials. Judge Alsup described Anthropic’s use of books as “spectacularly transformative,” emphasizing that the models do not replicate but rather create new outputs with different purposes and characters.

Judge Chhabria concurred, noting Meta’s use of plaintiffs’ books served a “further purpose” distinct from the originals. This aligns with the Supreme Court's emphasis on whether the secondary use adds something new and does not merely supersede the original.

Converting books into mathematical vectors to generate new text reflects a transformative process similar to how Google Books turned scanned pages into a searchable index. This transformative nature supports the First Amendment’s goal of promoting learning and access to information.

Model Outputs, Not Memorization, Are Central

Plaintiffs argued that LLMs act as plagiarism machines, memorizing and reproducing copyrighted content. While both courts accepted that some memorization occurs, they focused on whether models actually output infringing content.

In Bartz v. Anthropic, no infringing outputs were found, and in Kadrey v. Meta, Meta’s model rarely reproduced more than 50 tokens verbatim, and only under coaxing prompts. The courts made clear that limited verbatim overlap is not copyright infringement.

This distinction separates lawful learning from unlawful copying. It also encourages AI developers to design models that avoid extensive verbatim outputs, reducing infringement risk while preserving transformative uses.

No Automatic Right to a Licensing Market for AI Training

Plaintiffs claimed that fair use would harm potential licensing markets for AI training data. Both courts rejected this argument, holding that copyright holders do not have an inherent right to control or monetize licensing specifically for AI training.

Judge Chhabria emphasized that loss of a hypothetical licensing market cannot be used to defeat fair use claims, as this would create circular reasoning favoring rightsholders in every case. Judge Alsup echoed this view, stating such a market is not guaranteed under copyright law.

This reinforces that fair use exists to allow uses “sufficiently orthogonal” to the original, enabling knowledge progress without requiring permission or payment in every instance.

Use of Shadow Libraries Does Not Void Fair Use

Both cases involved AI companies using pirated copies from shadow libraries for training data. Plaintiffs argued this tainted the fair use defense. The courts disagreed, separating the legality of data acquisition from the legality of the training use itself.

While fair use protected the transformative training, the act of acquiring pirated works remains infringing. Judge Alsup was clear that downloading from pirate sites when lawful alternatives exist is “irredeemably infringing,” even if the subsequent use is transformative.

This distinction emphasizes that responsible data sourcing is essential. Companies must still address copyright liability related to how they obtain training materials, even if the training use qualifies as fair use.

Key Difference: Market Dilution Theory

Judge Chhabria gave significant weight to the fourth fair use factor—market effect—and discussed a “market dilution” theory. This theory suggests AI-generated content might flood the market and reduce demand for original works, despite little evidence supporting it.

Judge Alsup rejected this notion, noting that copyright law protects against market substitution, not competition or dilution. The Supreme Court has consistently emphasized that creative works can coexist and compete without infringing.

This emerging market dilution argument, while currently unsupported, may influence future litigation and policy discussions, warranting close attention from legal professionals.

Policy Implications: Need for Public Datasets

These rulings suggest the necessity for publicly accessible, lawfully assembled datasets for AI training. Relying solely on expensive acquisitions limits AI development to well-funded companies, risking concentration of power.

Shared public datasets can support a wider range of developers and researchers, fostering innovation while respecting copyright. This approach balances the constitutional goal to “promote the progress of science and useful arts.”

Conclusion

These early decisions offer a cautiously optimistic view on fair use in AI training. They underscore the importance of transformative use, model outputs, and rejecting unfounded licensing claims. However, responsible data acquisition remains critical to avoid copyright liability.

While more litigation is expected, these rulings demonstrate that careful, fact-based analysis can uphold fair use principles in this evolving field. Legal professionals should monitor developments closely as courts continue to define the boundaries of AI and copyright law.