NYT Sues Perplexity AI Over Alleged Unauthorized Use of Millions of Articles
The New York Times filed suit against Perplexity AI on December 5, alleging the startup copied, distributed, and surfaced millions of Times articles without permission to train and run its AI products. The complaint says the tools also produce false claims that are presented as Times reporting and sometimes display the newspaper's registered trademarks alongside those outputs.
A Times spokesperson said the company supports ethical AI development but rejects unlicensed use of its journalism. The suit seeks damages, injunctive relief, and other remedies to stop the alleged misuse.
Where the case was filed
The action was brought in the U.S. District Court for the Southern District of New York, reportedly after a cease-and-desist letter more than a year ago. Court site: SDNY.
Key allegations
- Reproduction and distribution of Times content, including material behind paywalls, to train and operate AI systems.
- Outputs that include inaccuracies presented as Times reporting, paired with the Times' trademarks, which the Times argues mislead users.
- Use of Times journalism despite prior notice to stop, according to the complaint.
Perplexity's response
Perplexity has pushed back on the claims. Its communications lead said publishers are using legal tactics to suppress new technologies and stated the company does not scrape data to build its core model, but indexes public pages and provides citations.
Related legal pressure
- Chicago Tribune filed a similar case a day earlier.
- Reddit sued Perplexity in October in New York federal court, alleging unlawful data access.
- Perplexity also faces suits from Encyclopedia Britannica, Dow Jones, and the New York Post.
The Times has licensed content before, including a deal to make articles available for Amazon's Alexa. It is also in dispute with OpenAI over related issues. Reports note ongoing friction between publishers and AI companies over use of copyrighted material and adherence to standard web protocols intended to prevent large-scale data collection.
What legal teams should watch
- Copyright infringement claims: Alleged reproduction and distribution of protected works at both the training and output layers. Expect arguments over fair use, transformative purpose, and the distinction between model training and verbatim or near-verbatim output.
- Trademark and false designation: Displaying a publisher's mark near AI output that contains errors can raise claims under the Lanham Act for source confusion or false endorsement.
- Access and contract theory: Use of paywalled or restricted content may implicate terms of service and compliance with site access rules. Alleged ignoring of technical exclusion signals can strengthen publisher claims.
- Remedies: Injunctions to limit training/outputs, takedown requirements, model/data deletion or quarantine, statutory damages if willfulness is shown, and ongoing oversight.
- Discovery posture: Plaintiffs will press for training data lineage, retrieval pipelines, output caching, and logs. Defendants will try to cabin discovery to protect models and trade secrets.
Practical steps for in-house counsel and litigators
- Inventory all training and retrieval data sources; document licenses, opt-outs, and enforcement of site access rules.
- Assess output filtering and attribution: avoid presenting third-party brands or mastheads with AI text; add clear source labeling and disclaimers where appropriate.
- Develop a licensing playbook for premium and paywalled content, including audit rights and revocation mechanics.
- Tighten vendor requirements: pass-through obligations for data provenance, logging, and deletion protocols.
- Prepare for preservation and discovery on model versions, datasets, retrieval systems, and prompt/output logs.
Bottom line: this case could influence how courts view training data, retrieval-augmented generation, and brand use in AI outputs. If you work with AI products-or face them on the other side-assume scrutiny of data sourcing, permissions, and how outputs are presented to users.
Your membership also unlocks: