Federal Judge Allows Copyright Infringement Claims Against Nvidia to Proceed
A federal judge has largely sided with three novelists who sued Nvidia for training artificial intelligence systems on pirated books, rejecting most of the company's arguments to dismiss the case. U.S. District Judge Jon Tigar allowed claims for direct and contributory copyright infringement to move forward in the Northern District of California on Tuesday.
The authors - Brian Keene, Abdi Nazemian and Stewart O'Nan - filed suit more than two years ago, alleging Nvidia used pirated copies of their works to train large language models. The decision could shape how AI companies acquire the massive datasets required to build their systems.
The Dataset at the Center of the Case
Nvidia trained multiple models using a dataset called "The Pile," which included a subcollection of nearly 200,000 pirated books sourced from a shadow library called Bibliotik. The authors say their copyrighted works appeared in this pirated collection, called Books3, which made up 12% of The Pile.
Nvidia countered by submitting a screenshot from its own website suggesting one model, Megatron 345M, was trained only on portions of The Pile that excluded Books3. Tigar declined to consider this document at the pleadings stage, noting that doing so could dismiss potentially valid claims before plaintiffs obtain evidence through discovery.
Without the model card, Tigar found the authors plausibly connected their works to the training data used in Megatron 345M and other models in Nvidia's Megatron line.
Scripts Designed to Download Pirated Material
The authors also claimed Nvidia provided customers - including Writer, Persimmon AI Labs and Amazon - with scripts specifically designed to automatically download and preprocess The Pile for their own AI development.
Nvidia argued the broader NeMo Megatron Framework had substantial non-infringing uses and was never marketed as a copyright infringement tool. Tigar drew a sharp distinction: the question was not whether the platform as a whole could be used legitimately, but whether these specific scripts had any other purpose.
"The scripts are alleged to have no other purpose than to speed up the process of infringement," Tigar wrote.
The judge also sided with the authors on whether Nvidia knew what customers were doing with the tools. Their complaint identified concrete instances of infringement by named customers, which Tigar found sufficient to establish knowledge.
One Claim Falls Short
Tigar dismissed the authors' vicarious infringement claim, which requires showing a defendant had both the right to control infringing conduct and a direct financial interest in it. The authors' argument that Nvidia could control customers' conduct once they independently accessed The Pile was too vague, the judge found.
The authors also failed to show that access to infringing material served as a draw for customers, rather than just an added benefit. Tigar gave the plaintiffs 21 days to amend this claim.
Broader Implications
The case represents one of several copyright lawsuits targeting AI companies' training practices. The Joseph Saveri Law Firm, which represents these authors, also represents writers suing OpenAI over similar concerns about how the company acquired training data.
The ruling leaves core questions unresolved until the case proceeds to discovery, where both sides can obtain evidence about what data Nvidia actually used and what the company knew about its customers' practices.
Learn more about how generative AI and LLMs are trained, and explore resources for writers navigating AI in their field.
Your membership also unlocks: