Publishers vs AI: Who's Winning the Copyright Fights So Far

Courts are drawing lines on AI training and copying. Echo someone's prose and you're in trouble; where you trained and how transformative your use is can decide the case.

Categorized in: AI News Legal
Published on: Dec 04, 2025
Publishers vs AI: Who's Winning the Copyright Fights So Far

Media Law | December 3, 2025
Publishers versus AI: The copyright rulings that matter now

AI developers trained models on vast stores of journalism, books, and images without permission. Courts are now drawing lines. The early picture: output-level copying is risky, jurisdiction matters, and "transformative" use is doing most of the heavy lifting for some AI defenses.

At a glance

  • US publishers vs Cohere (SDNY) - Win for publishers: motion to dismiss denied; output similarity plausibly alleged.
  • Getty vs Stability AI (UK) - Win for AI: no UK training = no direct copyright liability; limited trademark issues only.
  • Thomson Reuters vs Ross Intelligence (D. Del.) - Win for publishers: Westlaw headnotes protectable; no fair use on summary judgment.
  • GEMA vs OpenAI (Germany) - Win for publishers: training found to reproduce works; TDM exception did not apply; damages ordered.
  • Authors vs Meta (N.D. Cal.) - Win for AI: fair use on summary judgment; judge cautioned against broad reading.
  • Authors vs Anthropic (N.D. Cal.) - Win for AI: fair use for training; separate "central library" issue settled before trial.

US publishers vs Cohere

Fourteen news and magazine publishers defeated Cohere's motion to dismiss in November. The court said the complaint plausibly alleges outputs that are "quantitatively and qualitatively similar" to the publishers' content.

Key point: facts can be republished, but copying expression crosses a line. The court noted examples where Command allegedly produced near-verbatim material, creating a jury issue on infringement and on Cohere's knowledge of it.

Getty Images vs Stability AI (UK)

Getty failed to secure a UK ruling against training on its images. The claim fell because Getty could not show the model was trained in the UK. A secondary infringement theory also failed: the court found a model that does not store or reproduce copyright works is not an "infringing copy" under UK law.

Trademark claims partially succeeded for older outputs, but the judge found no basis for additional damages. Practical lesson: training location and technical evidence about what a model stores are outcome drivers. Getty continues its US case, where training jurisdiction may align with the facts.

Thomson Reuters vs Ross Intelligence

On summary judgment, the court held Westlaw headnotes are protectable because they distill and explain judicial opinions. Ross's use was commercial and not transformative; "actual copying" of 2,243 summaries was so obvious no reasonable jury could find otherwise.

Ross's defenses-innocent infringement, copyright misuse, merger, scenes à faire-failed. Some issues proceed, and an appeal is pending. For legal databases and premium publishers, this decision strengthens protection for curated editorial value.

GEMA vs OpenAI (Germany)

A German court found ChatGPT's training infringed copyright in song lyrics and awarded damages. The court said TDM exemptions did not apply because training involved reproducing works, not just extracting information.

This is a significant European data point: training that reproduces copyrighted text may require consent and licensing. The ruling has been cited as strengthening journalists' position for textual works as well.

Authors vs Meta

A federal judge granted partial summary judgment for Meta, finding fair use for training Llama on books and calling the use transformative. The court also noted the plaintiffs did not show substantial market harm on the record presented.

Caution from the bench: the judge stressed this should not be read as a blanket pass for all LLM training. Stronger evidence of market harm could flip outcomes in future cases.

Authors vs Anthropic

The court held training Claude on books was fair use and transformative because Claude did not reproduce creative elements or a specific author's identifiable style. Separate issue: Anthropic's creation and retention of a "central library" raised infringement concerns, but the case settled before trial.

Net: training was protected; internal data handling still poses exposure. Documentation and retention policies matter.

What these rulings mean for legal teams

  • Outputs vs inputs: Liability risk spikes when outputs closely mirror protected expression. Evidence of near-verbatim returns survived a motion to dismiss in SDNY.
  • Jurisdiction is decisive: UK claims failed where training did not occur in the forum. Map data flows, training locations, and model versions early.
  • Model architecture matters: Courts are parsing whether a model "stores" works. Technical affidavits on weights, embeddings, and retrieval pipelines can make or break secondary infringement theories.
  • Transformative use: US courts split. Some see LLM training as transformative; others reject fair use when the defendant competes with the rights holder's market (e.g., legal research).
  • Market harm evidence: Expect courts to probe displacement risk. Build or attack the record on subscription loss, licensing markets, and substitution.
  • TDM limits in the EU: Exceptions have boundaries. Where training reproduces works, consent and licensing may be required.
  • Trademark spillover: Even if training avoids copyright liability, output branding artifacts can still prompt claims-though damages require proof of scope and persistence.
  • Data governance: Separate training, evaluation, and any cached "libraries." Retention and provenance controls reduce exposure.
  • Licensing strategy: Where risk is high (high-value editorial or databases), negotiated licenses or dataset whitelisting can be more cost-effective than litigation.

Open questions to watch

  • NYT vs OpenAI/Microsoft: Will a US court squarely address training on news and the weight of market harm?
  • Appeals in Ross and any follow-on disputes over "editorial" value vs facts.
  • Further EU decisions testing the scope of TDM after the GEMA ruling.
  • Technical defenses: How courts treat retrieval-augmented systems, deduplication, and memorization controls.

Helpful references

The throughline is simple: facts are free, expression is not. If your model (or your competitor's) starts echoing protected text, the risk is real-regardless of how clever the training pipeline looks on paper.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide