eData Edge
November 11, 2025
Courts Are Starting To Define What "AI Discovery" Means
Generative AI is now built into research, drafting, and coding across enterprises. That means prompts, outputs, and logs are showing up on discovery requests. A recent ruling from the Southern District of New York offers a clear signal: apply ordinary discovery rules - relevance and proportionality - to AI data.
The Ruling That Set the Tone
On September 19, 2025, the U.S. District Court for the Southern District of New York issued a key decision in the In re OpenAI Inc. Copyright Infringement Litigation brought by The New York Times against OpenAI and Microsoft. The court declined to compel production of The New York Times' internal AI prompts and outputs. Why? They weren't tied to the core copyright issues, and even if they had marginal relevance, the volume and privilege-review burden weren't proportional to the needs of the case.
Back in May, the court had ordered OpenAI to preserve all output logs going forward, sparking concern that companies might need to keep every AI interaction indefinitely. The September decision clarified that preservation must remain targeted and defensible. Later proceedings ended the broad, going-forward preservation mandate.
The takeaway: apply the same standards you use for email, chat, and cloud data. Relevance and proportionality under FRCP 26(b)(1) still govern AI discovery.
What AI Content Might Be Discoverable
- Prompts and outputs: What users typed or uploaded, and what the tool returned - including AI-generated summaries, transcripts, or drafts that relate to the claims or defenses.
- Limited metadata and logs: Timestamps, tool or model identifiers, request IDs, and basic audit data (access, settings) can be relevant.
- Typically out of scope: System-wide logs, training data, or non-case-specific telemetry, unless they directly bear on the issues and are within the party's control.
- Contracts matter: Vendor terms may affect access and production (ownership, confidentiality, notice requirements). Review them early.
A Practical Preservation Playbook
Preserve with intent, not a vacuum. Document your approach and be ready to explain it.
- Identify the who and the what: Which custodians touched the issues? Which AI tools did they use? What types of data went in, and what came out?
- Map the where: Determine storage locations for prompts, outputs, chat histories, summaries, drafts, and minimal logs (e.g., vendor dashboards, admin consoles, exportable audit trails).
- Target scope: Preserve prompts, outputs, and minimal logs tied to the dispute - not every AI interaction across the company.
- Document settings: Note model versions, retention settings, and access controls in effect during the relevant period.
- Account for privilege: Anticipate privilege screens if legal teams used AI for strategy, fact analysis, or draft work product.
Negotiating AI Discovery
AI-related discovery is highly fact-dependent. In many matters, requests for broad AI data will be objectionable; in others, a narrow slice will be appropriate.
- Lead with relevance and proportionality: If prompts, outputs, or logs don't hit the issues, object. That's consistent with the rules and emerging case law.
- Narrow the aperture: Limit by custodian, topic, and date range. Consider phased discovery, sampling, or producing metadata first.
- Quantify burden: Translate review needs into numbers - entries, estimated privilege hits, review hours, and cost. Courts respond to concrete data.
- Address personal use: Employees sometimes use workplace AI tools for personal tasks. That raises privacy issues and adds review burden - another reason to tighten scope.
- Use vendor levers: Some platforms enable admin exports, retention controls, or audit reports. Know what's available before you negotiate or commit.
What This Means for Legal Teams
- Same standards, new data: Treat AI content like any other ESI. Relevance and proportionality are your north star.
- Preservation is independent: Even if you expect to object to production, identify custodians and sources and take reasonable steps to preserve potentially relevant AI data.
- Precision over volume: Courts expect targeted, documented decisions - not blanket "save everything" or "save nothing" approaches.
- Governance pays dividends: Clear policies on authorized tools, retention settings, and approved use cases make discovery cleaner and cheaper.
Sample Meet-and-Confer Positions
- Scope proposal: For named custodians A, B, and C, produce prompts and outputs related to Topics X and Y for Dates 1-2, plus minimal logs (timestamp, model, request ID). Exclude non-substantive system telemetry and training data.
- Phasing: Phase 1: produce metadata/log fields and a 5-10% sampled set of entries for privilege calibration. Phase 2: expand only if Phase 1 shows material relevance.
- Burden detail: "Custodian A has ~18,000 entries; initial privilege pass projects 20-30% hits; estimated 250-350 review hours at $X/hour."
- Privacy filter: Exclude entries marked personal or outside business scope; apply agreed keyword screens for personal content.
Action Checklist
- Inventory approved AI tools, retention defaults, and admin export options.
- Update legal hold templates to address prompts, outputs, and minimal logs.
- Define privilege protocols for AI-assisted legal work product.
- Review vendor contracts for data access, ownership, and notice clauses.
- Train key custodians on scoping, privilege hygiene, and personal-use boundaries.
Bottom Line
"AI discovery" isn't a new regime. It's the same rules applied to new sources. Object where data isn't connected to the claims or defenses. When it is and the scope is proportional, produce it like any other ESI. Keep preservation targeted, documented, and defensible.
Want practical upskilling for your team? See role-specific AI programs at Complete AI Training.
© 2025 All Rights Reserved. This post is a general summary of legal issues and does not constitute legal advice. Consult counsel for guidance on specific facts.
Your membership also unlocks: