Why The New York Times is Suing Perplexity AI - A Practical Brief for Legal Teams
The New York Times has sued Perplexity AI for allegedly using Times journalism without permission to train models and generate outputs that closely track its reporting. This case sits at the fault line between copyright, data scraping, and AI product design. If you advise a newsroom, a tech company, or an AI vendor, the implications are immediate and material.
What's at issue
The Times claims Perplexity scraped paywalled and restricted content, then produced summaries or near verbatim outputs that echo the paper's style, structure, and facts. It also alleges users could be misled about sources, reducing traffic, ad revenue, and subscription value. Perplexity denies wrongdoing and says it uses publicly accessible data, follows access controls, and generates transformative summaries.
- Alleged unauthorized scraping of paywalled or restricted Times content.
- Outputs that allegedly reproduce or closely mirror Times journalism.
- Attribution concerns that could mislead users about information origin.
- Competitive and economic harm from fewer click-throughs and subscriptions.
Perplexity's position
- Data collection aligns with industry practices and website policies.
- Training relies on publicly accessible materials.
- Outputs are positioned as transformative summaries, not copies.
- Open to collaboration and licensing discussions with publishers.
The court will grapple with whether training on, and generating from, news content falls within fair use and how far web scraping can go where paywalls and access restrictions exist.
The legal questions likely to decide the case
- Copyright infringement and fair use: Purpose and character (commercial vs. transformative), nature of the works, amount/substantiality, and market effect will be front and center. For a quick refresher on the factors, see the U.S. Copyright Office's overview: Fair Use.
- Scraping and access controls: The weight courts give to robots.txt, paywalls, and terms of service when bots ingest content at scale.
- Attribution and consumer perception: Whether presentations that summarize "the web" while leaning on specific sources create confusion about origin or endorsements.
- Economic harm: Evidence of traffic diversion, subscription impact, and substitution effects from AI answers.
- Transparency obligations: How much disclosure AI companies owe regarding datasets and output provenance, given ongoing regulatory moves in the U.S. and Europe.
Why this matters for publishers and AI companies
- Loss of traffic and revenue: Detailed AI answers can reduce downstream visits to original reporting.
- Attribution risk: Weak or absent citations diminish credit to reporters and brands.
- Competitive pressure: Aggregated answers can outrun original sources on convenience and reach.
- Compensation models: Expect pressure for licenses that mirror prior deals with social and search platforms.
Practical checklist for AI product counsel
- Data sourcing audit: Map training, fine-tuning, and retrieval sources. Flag paywalled, restricted, or disallowed domains.
- Access control compliance: Enforce robots.txt, rate limits, and terms of service. Maintain logs that show compliance.
- Licensing strategy: Identify high-value publishers for opt-in licenses. Track rights scope (training vs. output use vs. display).
- Output safeguards: Implement similarity thresholds, paraphrase constraints, and blocklists to prevent near verbatim reproduction.
- Attribution and links: Provide clear citations and source links where feasible to reduce confusion and support referral traffic.
- Dataset transparency: Prepare defensible disclosures about data categories and curation policies.
- Records and evidence: Preserve crawl logs, dataset manifests, fine-tuning configs, and evaluation results for litigation readiness.
- Human review: Route sensitive domains or investigative reporting through stricter filters and escalation paths.
- Governance: Create a cross-functional review board for high-risk content and publisher requests.
Action items for publishers and in-house counsel
- Technical controls: Update robots.txt, paywall rules, and bot detection. Monitor traffic patterns from known crawlers and headless agents.
- Contract terms: Tighten ToS to address scraping, AI training, and automated access explicitly. Consider API-based licensing with audit rights.
- Evidence capture: Document instances of regurgitation, brand confusion, and traffic substitution. Preserve timestamps and prompts.
- Licensing menu: Define clear offers (training, summaries, excerpts, display) with pricing tied to scope and attribution.
- Public guidance: Provide a rights and permissions page that states allowed uses, prohibited uses, and contact paths.
Potential outcomes and what they mean
- Settlement + license: Payment and guardrails on training and outputs; could become a template for other publishers.
- Ruling for The Times: Tighter limits on training with paywalled or premium journalism; stronger incentives to license.
- Ruling for Perplexity: Broader latitude for scraping and training; more pressure on publishers to negotiate distribution-friendly terms.
- Mixed outcome: Some uses allowed, others restricted, with remedies focused on output controls and attribution.
- Industry standards: Even without a final judgment, expect voluntary norms around dataset disclosures and source credit.
Regulatory context to track
U.S. lawmakers continue to explore updates to copyright for AI training and outputs. In Europe, policymakers have advanced rules touching transparency, safety, and data governance for AI systems. One reference point: the consolidated text of the EU AI Act on EUR-Lex, which many teams use to benchmark compliance programs. EUR-Lex
What happens next
Expect extended discovery into data pipelines, crawler behavior, training sets, and output similarity. Motions will focus on fair use, market effects, and technical safeguards around paywalled content. Parallel to the court schedule, watch for licensing deals that reduce risk exposure and set pricing signals for the rest of the market.
Skill up your team
If your mandate includes AI policy, governance, or product review, upskilling the legal function pays off quickly. Curated programs by job role can help counsel track practical AI issues and controls. Explore courses by job
Your membership also unlocks: