Court Allows Copyright Lawsuit Against Nvidia Over Book Training Data
Nvidia must defend itself against claims that it trained its NeMo Megatron large language model on copyrighted books without permission. A federal judge in California ruled Tuesday that three authors-Abdi Nazemian, Brian Keene, and Stewart O'Nan-presented enough evidence to move forward with their lawsuit.
Judge Jon S. Tigar found the authors plausibly showed their books were included in the dataset used to train Megatron 345M, Nvidia's language model. The ruling preserves their claims for contributory infringement, though Tigar dismissed vicarious infringement charges without prejudice, meaning the authors could refile them later.
The court also rejected Nvidia's attempt to dismiss claims about infringement involving unidentified models beyond Megatron.
What This Means for Writers
The decision signals that courts will allow copyright cases against AI companies to proceed to trial, at least when authors can show their specific works were likely used in training datasets. This case joins a growing number of lawsuits questioning whether AI developers have the right to use published material without compensation.
For writers, the outcome suggests courts may require companies to demonstrate they obtained permission or relied on fair use before training models on copyrighted text. The distinction matters: if the lawsuit succeeds, it could establish that developers bear responsibility for what data they use.
Learn more about the legal and technical dimensions of generative AI in our guide to Generative AI and LLM, or explore how AI for Legal professionals are approaching these emerging issues.
Your membership also unlocks: