AI's early fair-use wins may strengthen journalists' hand
Two federal courts have given generative AI companies an early boost: training large language models (LLMs) on copyrighted books can be a fair use. That stings for authors and other creators. But for news organizations, these rulings open a clearer path to argue market harm and secure licensing leverage.
Here's what matters for legal teams inside media companies - and how to act on it.
The rulings that set the tone
Bartz v. Anthropic. Authors alleged Anthropic copied pirated books, scanned lawfully purchased print books into digital form, built a permanent library, and used those texts to train its LLM. The court called the training use "quintessentially" and "spectacularly transformative," emphasizing the model learned from books to generate new text rather than replace the books themselves. It also found that digitizing lawfully purchased titles for analysis and search was a fair, transformative use.
The exception: creating and keeping pirated copies to build a permanent library was not reasonably necessary to a transformative use. That permanence looked like an unauthorized archive, so those claims survived and moved forward (the parties later entered settlement discussions). Notably, the authors didn't plausibly allege that model outputs reproduced their books - the court hinted that proof of output copying could change the result.
Kadrey v. Meta. Authors sued over Meta's training of LLaMA on their books. The court agreed the training purpose was "highly transformative," but warned that purpose alone cannot outweigh market injury. It flagged the fourth factor - market harm - as "the single most important factor." Because the record lacked evidence of economic harm, the court found fair use and dismissed speculative output-copying claims.
Why news content is different
The Kadrey court spotlighted a critical point for publishers: an LLM that generates accurate current-events content could seriously erode the print news market. That's a different risk profile than with novels. If users get timely summaries or updates directly in an AI interface, they may skip a publisher's site or cancel subscriptions - classic substitution.
There's also an ongoing dependency. LLMs need fresh reporting to stay accurate. Training isn't a one-and-done ingestion of language patterns; it requires continuous access to verified facts. That recurring need is leverage.
What these cases mean for fair use arguments
Both courts treated training as transformative. The split was weight: Bartz leaned harder on transformation; Kadrey elevated market harm. The practical takeaway is simple: you'll likely win (or at least survive dismissal) where you can credibly prove economic injury or a meaningful threat of substitution. Without that showing, transformation will carry the day for defendants.
Expect more courts to use Bartz and Kadrey as a roadmap. Outcomes will turn on evidence, especially on the fourth factor. For reference, see 17 U.S.C. ยง 107 and the U.S. Copyright Office's fair use guidance.
Action plan for in-house counsel at news organizations
- Build the market-harm record. Track traffic diversion correlated with AI answer boxes and chat interfaces. Document subscriber churn, ad revenue declines, and search referral shifts tied to AI features. Preserve historical baselines and run time-bound comparisons where possible.
- Capture real substitution. Log instances where AI outputs summarize, paraphrase, or update your reporting closely enough to replace a click-through. Preserve prompts, outputs, timestamps, and links to your original articles.
- Preserve ingestion evidence. Monitor and document scraping patterns, bot IPs, cache behavior, and any violations of your terms or robots directives. Keep screenshots and server logs.
- Highlight recency dependence. Show how accurate responses in your beat require your current reporting (e.g., elections, finance, public safety). Emphasize the ongoing feed, not just historical archives.
- Separate issues cleanly. Distinguish claims about unlawful copying to build datasets or archives from claims about fair use in training. Preserve output-copying claims only with concrete examples.
Licensing: convert leverage into predictable revenue
Litigation is slow and uncertain. If you can demonstrate market harm risk, you're in a stronger position to negotiate licenses that give AI companies certainty and you recurring value.
- Scope and purpose. Define covered content (text, headlines, images, archives, real-time feeds). Specify use cases: training, fine-tuning, retrieval-augmented generation, and output display.
- Real-time access. Price current-events feeds higher; enforce embargo windows for exclusives. Consider tiered latency (e.g., 5-15 minute delays) to protect subscriptions.
- Usage controls. Ban caching beyond defined windows. Limit model retention and derivative dataset creation. Require deletion on notice and end-of-term.
- Attribution and linking. Mandate clear source attribution with live links. Require respectful presentation (no truncation that misleads).
- Safety and provenance. Require audit logs, dataset lineage, and compliance with your terms. Include audit rights and third-party verification.
- Payment structure. Blend minimum guarantees with usage-based fees (tokens, calls, active users) and premiums for recency, exclusivity, or high-value beats.
- Enforcement. Include kill switches, suspension rights for breach, and liquidated damages for unauthorized retention or re-use.
- Indemnities and risk. Secure IP indemnity, data security commitments, and regulatory cooperation provisions.
Technical and policy levers to support your legal position
- Access governance. Tighten API terms, rate limits, and authentication to track enterprise use versus public scraping.
- Robots directives and metatags. Maintain clear crawl policies and audit compliance; log violations to support claims.
- Content watermarking and fingerprinting. Use identifiers to detect reuse in outputs and datasets.
- Terms of use. Make training and caching restrictions explicit; require consent for model building; reserve audit rights.
What to watch next
Courts will keep probing two questions: does training cause measurable market injury for news, and do outputs recreate protected expression at a level that competes with the original? Early rulings came on motions to dismiss; fuller records could shift outcomes. The most immediate wins will come from strong evidence and disciplined licensing strategy.
Round one went to AI. With a documented case for market harm and a clear path to ongoing data licenses, round two can favor the newsroom.
Your membership also unlocks: