Judge allows copyright infringement claims against Nvidia over AI training data to proceed

A federal judge ruled copyright infringement claims against Nvidia can proceed, after three novelists alleged the company trained AI models on nearly 200,000 pirated books. The case could affect how AI firms build their training datasets.

Categorized in: AI News Writers

Published on: May 07, 2026

Federal Judge Allows Copyright Infringement Claims Against Nvidia to Proceed

A federal judge has largely sided with three novelists who sued Nvidia for training artificial intelligence systems on pirated books, rejecting most of the company's arguments to dismiss the case. U.S. District Judge Jon Tigar allowed claims for direct and contributory copyright infringement to move forward in the Northern District of California on Tuesday.

The authors - Brian Keene, Abdi Nazemian and Stewart O'Nan - filed suit more than two years ago, alleging Nvidia used pirated copies of their works to train large language models. The decision could shape how AI companies acquire the massive datasets required to build their systems.

The Dataset at the Center of the Case

Nvidia trained multiple models using a dataset called "The Pile," which included a subcollection of nearly 200,000 pirated books sourced from a shadow library called Bibliotik. The authors say their copyrighted works appeared in this pirated collection, called Books3, which made up 12% of The Pile.

Nvidia countered by submitting a screenshot from its own website suggesting one model, Megatron 345M, was trained only on portions of The Pile that excluded Books3. Tigar declined to consider this document at the pleadings stage, noting that doing so could dismiss potentially valid claims before plaintiffs obtain evidence through discovery.

Without the model card, Tigar found the authors plausibly connected their works to the training data used in Megatron 345M and other models in Nvidia's Megatron line.

Scripts Designed to Download Pirated Material

The authors also claimed Nvidia provided customers - including Writer, Persimmon AI Labs and Amazon - with scripts specifically designed to automatically download and preprocess The Pile for their own AI development.

Nvidia argued the broader NeMo Megatron Framework had substantial non-infringing uses and was never marketed as a copyright infringement tool. Tigar drew a sharp distinction: the question was not whether the platform as a whole could be used legitimately, but whether these specific scripts had any other purpose.

"The scripts are alleged to have no other purpose than to speed up the process of infringement," Tigar wrote.

The judge also sided with the authors on whether Nvidia knew what customers were doing with the tools. Their complaint identified concrete instances of infringement by named customers, which Tigar found sufficient to establish knowledge.

One Claim Falls Short

Tigar dismissed the authors' vicarious infringement claim, which requires showing a defendant had both the right to control infringing conduct and a direct financial interest in it. The authors' argument that Nvidia could control customers' conduct once they independently accessed The Pile was too vague, the judge found.

The authors also failed to show that access to infringing material served as a draw for customers, rather than just an added benefit. Tigar gave the plaintiffs 21 days to amend this claim.

Broader Implications

The case represents one of several copyright lawsuits targeting AI companies' training practices. The Joseph Saveri Law Firm, which represents these authors, also represents writers suing OpenAI over similar concerns about how the company acquired training data.

The ruling leaves core questions unresolved until the case proceeds to discovery, where both sides can obtain evidence about what data Nvidia actually used and what the company knew about its customers' practices.

Learn more about how generative AI and LLMs are trained, and explore resources for writers navigating AI in their field.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Judge allows copyright infringement claims against Nvidia over AI training data to proceed

Federal Judge Allows Copyright Infringement Claims Against Nvidia to Proceed

The Dataset at the Center of the Case

Scripts Designed to Download Pirated Material

One Claim Falls Short

Broader Implications

Related AI News for Writers

Chicago journalists and voice actors sue Amazon, Google, Apple and others over AI voice training under Illinois privacy law

AI writing sounds fluent but falls apart under scrutiny, editor argues

Literary prizes struggle to address AI disclosure as Commonwealth Short Story Prize reviews winning entry

Authors win $1.5 billion settlement against Anthropic over unauthorized use of books to train AI

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: