Copyright Law May Not Be the Right Tool to Regulate AI Training
More than 70 copyright infringement lawsuits have been filed against AI companies in the past year. Disney, Universal, and Warner Bros. sued Midjourney and other AI firms. The New York Times sued OpenAI. Anthropic paid $1.5 billion to settle a case involving pirated works used for training data.
Courts have sent mixed signals on what counts as illegal use of copyrighted material by AI. This year could produce decisions that reshape how U.S. copyright law applies to generative AI.
Rebecca Tushnet, a Harvard Law School professor specializing in copyright and trademark law, has a contrarian view: copyright law is poorly suited to regulate AI, and changing it could harm creators and future innovation.
How Fair Use Works in AI Training
Fair use allows some unauthorized uses of copyrighted works when they serve society-parody, satire, academic commentary, and research all qualify.
"Big data" uses often qualify as fair use because they generate new insights with different purposes than the original works. Training an AI model on hundreds of thousands of works to produce something new and useful can fall into this category, Tushnet said.
When AI Training Crosses the Line
If a trained model produces infringing copies as outputs every time it runs, that violates copyright law. If it never produces infringing copies, fair use likely applies.
The gray area: models that sometimes generate infringing copies. Here, the distinction matters. If the infringing output results from the model's training itself-copies emerging readily from innocuous prompts-that suggests infringement. If a user deliberately prompts the model to generate an infringing copy, the fault lies with the user, not the model.
Two district court cases ruled this way last year, and Tushnet said this approach is correct.
Market Substitution Isn't Copyright's Concern
Some argue that AI-generated news summaries undercutting newspaper subscriptions constitute copyright harm. This reasoning misunderstands how copyright works.
Record players put restaurant musicians out of work, but that wasn't copyright infringement. When a Google Books search answers a factual question-when was FDR born?-without requiring purchase of the source book, copyright doesn't care. Copyright protects expression, not facts.
Competition among non-infringing alternatives is healthy. It gives audiences more choices.
What Models Actually Store
No evidence suggests that most trained models contain copies of their training materials. Some models can be prompted to generate near-perfect copies of specific works like the first Harry Potter book or 1984.
If a model can be easily prompted to generate a decent copy, courts will likely treat the model as containing a "copy" of that work. But this probably applies to only a handful of widely available works and must be evaluated case-by-case.
The more fundamental issue: the training process itself involves making and using digital copies. That requires either fair use or another legal justification, regardless of whether the resulting model contains copies.
Should AI Companies Pay for Training Data?
Tushnet opposes most generative AI on its merits but argues copyright law is the wrong regulatory tool.
Fair use under U.S. law is deliberately flexible and has handled AI issues well so far. Ruling against AI companies on fair use grounds would create a new "training market" for copyright owners-and that threatens both existing fair uses and unknown future innovations.
Copyright owners can always claim they want licensing fees for any use they choose to control, including commentary and criticism. Rejecting fair use would shift money to large publishers and studios, but history shows this doesn't help artists.
Even if copyright owners give one-time payments to current authors, future contracts won't include additional AI training payments, or will offer minimal amounts-similar to Spotify's per-stream payouts to musicians.
The real problem artists face isn't lack of legal rights. It's lack of market power.
For legal professionals tracking copyright developments in AI, AI for Legal covers how AI tools are reshaping legal research and compliance work-including the regulatory frameworks emerging around copyright and AI training.
Your membership also unlocks: