Anthropic is facing a new copyright lawsuit from more than 100 authors who allege the company used pirated books to train its Claude AI system. The complaint lands as a federal judge in California issues a split ruling in a separate case against the same company-one that begins to draw legal lines around AI training while leaving the hardest questions unresolved.
A new lawsuit targets Anthropic's training data
The authors claim Anthropic relied on unlawfully obtained digital copies of their work from online sources to build training datasets. They say the company knew or should have known the books were pirated. The suit joins a growing stack of litigation from writers, publishers, and rights holders challenging how AI developers collect the massive text corpora that power generative models.
Court draws a line on fair use
In the California federal case, the judge ruled that training an AI model on legally acquired books can qualify as fair use under U.S. copyright law. The judge described the process as "transformative"-the model learns patterns and relationships in language, not the books themselves. That ruling keeps the core training activity within fair use doctrine for now, but it did not end the dispute.
Piracy questions remain unresolved
The court separated a second, critical issue: allegations that Anthropic downloaded and stored millions of pirated books. That question was not settled in the fair-use ruling and will proceed to a separate trial. Legal experts note that courts are breaking AI disputes into narrower questions rather than issuing one sweeping decision on the technology. The piracy claims, if proven, could still expose Anthropic to significant copyright liability.
Meanwhile, parallel cases involving music publishers and other creative works are testing whether the same fair-use principles extend to song lyrics and other copyrighted material. Together, these cases are building the early legal framework for generative AI in the United States.
Why this matters for legal professionals
For lawyers and in-house counsel advising AI developers or rights holders, the Anthropic rulings clarify one thing and complicate another. The fair-use finding on legally obtained books offers a partial shield, but it comes with a sharp edge: how training data is sourced now matters enormously. The piracy claims heading to trial signal that due diligence on dataset provenance could become a central liability question-one that contract terms, warranties, and indemnification clauses will need to address long before appellate courts weigh in.
Your membership also unlocks: