India proposes pay-on-revenue licensing for AI training data
India's Department for Promotion of Industry and Internal Trade (DPIIT) has floated a plan that lets AI developers train on lawfully accessed content now and pay later-only after the model starts earning revenue. The goal is simple: keep innovation moving while ensuring creators get paid.
The proposal comes from a Committee on Generative AI and Copyright. It rejects a zero-price license model, arguing it would weaken incentives for human creators and depress future output. At the same time, it recognizes that one-off licensing talks can stall smaller players and raise costs across the board.
The three-part model
- Blanket training license: AI developers can use all lawfully accessed content for training without individual negotiations.
- Royalties after commercialization: Payments start only when the AI system generates revenue. Rates are set by a government-appointed committee and subject to judicial review.
- Centralized collection and payout: A single mechanism handles royalties to cut admin costs and add legal certainty for large firms, startups, and MSMEs.
New institutions: CRCAT and a works registry
The paper proposes a nonprofit collective-Copyright Royalties Collective for AI Training (CRCAT)-run by rightsholder associations and designated by the government. A "Works Database for AI training royalties" would let creators register to receive payments through CRCAT.
This mirrors established collecting societies that manage music or reprographic rights in many countries. The model is familiar, which makes it easier to implement and explain.
Why this matters for government
- Clarity for procurement: A blanket training license with a pay-on-revenue trigger reduces uncertainty in public tenders involving AI.
- Support for smaller firms: Startups and MSMEs get predictable access to data without upfront licensing hurdles.
- Multilingual coverage: India's 22 scheduled languages and fragmented media market make one-to-one licensing impractical. A collective model scales better.
- Judicial oversight: Committee-set rates with review help balance creator interests with national AI goals.
Implementation to-do list for ministries and agencies
- Define "lawfully accessed" content: Clarify scraping, terms-of-service, public domain, open licenses, and user-consented data.
- Set the revenue trigger: Specify what counts as commercialization (API usage, embedded features, ads, subscriptions, on-prem licensing) and define thresholds.
- Rate-setting method: Publish a transparent formula (e.g., revenue share, usage bands, sector-based tariffs) and review cycles.
- Scope boundaries: Separate "training" from "fine-tuning," "evaluation," and "inference." Address synthetic data and derivative models.
- Works Database design: Simple onboarding for creators in all major languages, with identity checks, metadata standards, and privacy safeguards.
- Allocation logic: Decide how to apportion royalties when training data is untraceable or mixed (statistical proxies, sampling, or declared usage).
- Compliance and audit: Require usage logs, model cards, and revenue attestations. Build tools to detect fraud and double-claims.
- Dispute resolution: Fast-track processes for opt-outs, conflicts, and appeals. Keep costs low for individual creators.
- Cross-border issues: Plan for foreign rightsholders and reciprocal arrangements with overseas collecting societies.
- Public sector readiness: Issue procurement guidance so departments can assess vendor compliance without slowing projects.
What this could mean in practice
- For AI providers: Easier access to training data up front, with costs tied to real revenue later. Compliance becomes a line item, not a blocker.
- For creators: A pathway to recurring income based on verifiable commercial use, without chasing individual deals.
- For regulators: A single dashboard to track payments, disputes, and uptake, plus levers to adjust rates as the market evolves.
Open questions to resolve
- Will creators have an opt-out for training uses? If so, how is it enforced?
- How will open-source and public-domain content be treated in the mix?
- What about user-generated content where platforms have varying licenses?
- How are pre-existing models treated-grandfathered or back-pay?
- Will rates vary by sector (news, books, images, code) or usage intensity?
- How to handle multilingual weighting across India's major languages?
Context and international parallels
The approach borrows from music and reprographic collecting societies used in many markets. It attempts to avoid drawn-out one-off negotiations while keeping compensation for rightsholders in place. If royalty rates are predictable and the registry is simple, large tech firms may accept the tradeoff-especially in a market as big and language-diverse as India.
Next steps for policy teams
- Draft a clear definition of commercialization and revenue thresholds.
- Publish a consultative paper on rate methodologies and review timelines.
- Prototype the Works Database with APIs for bulk registration and verification.
- Coordinate with states and media bodies for multilingual outreach.
- Prepare procurement templates that require vendor compliance declarations.
For reference on the department leading this work, see the DPIIT website here. For background on collective management models worldwide, the WIPO overview is useful here.
If your team needs practical upskilling to evaluate AI vendors and policies, explore role-based learning paths here.
Your membership also unlocks: