AI training on trial: the next legal frontier in copyright law
Date: December 4, 2025
Key takeaways
- Courts in Canada and the U.S. will test how fair dealing/fair use applies to AI training in 2026. Outcomes will set practical guardrails for dataset sourcing, model training, and product deployment.
- Canadian consultations point to targeted amendments to the Copyright Act, including text and data mining (TDM), fair dealing, and clearer rules for AI training.
- Legal teams should audit training datasets, tighten vendor contracts, and implement output controls now. Waiting for final answers will be more expensive than preparing for them.
Why this matters
Modern AI trains on enormous datasets that often include protected works. That's true for large language models and even more so for multi-modal systems that learn from text, images, audio, and video.
As capability ramps up, so do questions about lawful access, data provenance, circumvention of technological protection measures (TPMs), and whether training substitutes for the original markets for those works.
Fair dealing (Canada) vs. fair use (U.S.) - the fault lines
Fair use in the U.S. is open-ended. Courts weigh statutory factors and can find fairness for new purposes if the balance supports it.
Canada's fair dealing is purpose-limited. Courts interpret the listed purposes broadly (e.g., research, private study, education, parody/satire, criticism/review, news reporting), but uses outside those buckets do not qualify. After purpose, Canadian courts assess fairness using factors similar to the U.S.: purpose, character, amount, alternatives, nature of the work, and market effects.
For AI training, the most sensitive issues are the purpose/character of use, the amount taken at scale, and whether the training or outputs displace demand for the originals.
Early U.S. signals
Thomson Reuters v. Ross Intelligence. The court rejected the fair use defence where training sought to build a direct market substitute using plaintiff's content. Commercial purpose and market effect weighed heavily against the defendant, even though some factors cut the other way.
Authors v. Anthropic (Claude). The court found fair use for training on copyrighted books and digitizing lawfully acquired print copies for internal storage, but not for acquiring or keeping pirated copies. Critical to the decision: no alleged infringing outputs, safeguards to prevent infringing outputs reaching users, and no displacement of demand for the books. The parties then announced a large settlement.
Takeaway: U.S. courts are probing two pressure points - substitution risk and evidence of infringing outputs. Technical safeguards matter.
Canadian cases to watch
- MosaicML/DataBricks (B.C.). Allegations include training on datasets of pirated books and removal of copyright management information.
- OpenAI (Ontario). Major Canadian publishers allege unauthorized scraping, reproduction into datasets, training use, circumvention of TPMs, and breach of website terms.
These files are early, but they're positioned to clarify how fair dealing and other Copyright Act provisions apply to training.
Government consultations - where policy may land
Since 2021, Canada has consulted on AI and copyright, including the use of protected works for training. The "What we heard" report highlights a clear split: rights holders want consent, credit, and compensation; AI developers warn that overly restrictive rules could stall innovation and competitiveness.
Expect targeted amendments in 2026 on TDM exceptions, fair dealing calibration, and clearer roles and responsibilities for model trainers and deployers. For reference, the Copyright Act is available on the Justice Laws site: Copyright Act (R.S.C., 1985, c. C-42).
What legal teams should do now
You don't need final guidance to reduce exposure. Build a defensible posture that you can explain to regulators, courts, and customers.
1) Audit and document your training data
- Inventory all datasets used for training and fine-tuning (including third-party corpora, web-scraped material, and synthetic sets).
- Record provenance: source, date, acquisition path, license/terms, TPMs encountered, and any scraping controls respected.
- Identify protected works and categories: literary, artistic, sound recordings, cinematographic works, and performers' rights.
- Flag high-risk content: pirated sources, datasets with removed copyright management information, or terms that prohibit AI use.
- Maintain a dataset manifest and chain-of-custody records that are reviewable and auditable.
2) Tighten training governance and safeguards
- Adopt exclusion lists for known copyrighted works and rightsholder requests; respect robots.txt and site terms where applicable.
- Use sampling and hashing/similarity checks to reduce verbatim memorization and large-chunk reproduction.
- Implement output filters and refusal policies to prevent substantial reproductions; routinely red-team prompts that might trigger copying.
- Log prompt/output events tied to content filters to show diligence if challenged.
3) Update your contract playbook
- With model vendors: Demand disclosure of training sources at a useful level of granularity; negotiate indemnities for infringing outputs; require output filters and takedown processes; prohibit training on your proprietary data without explicit consent.
- With customers: Clarify rights in outputs, acceptable use, and any training on user content; allocate risk via representations, warranties, caps, and IP indemnities; set TPM-compliance and recordkeeping obligations.
- With data licensors: Secure express AI/TDM rights, retention limits, and audit rights; prohibit onward transfer where needed.
4) Prepare for disputes and regulator questions
- Stand up an incident response path for claimed copying or memorization, with rollback/patch plans for models or datasets.
- Preserve artifacts: dataset versions, training configs, safety settings, and evaluation results.
- Create a public-facing statement that is accurate but avoids unnecessary legal exposure.
How the U.S. may influence Canada
Canadian courts often look at U.S. fair use reasoning for context, even with different statutory design. Two practical signals are emerging: market substitution risk will be scrutinized, and evidence of infringing outputs (or strong safeguards preventing them) can be outcome-determinative.
Expect Canadian decisions to wrestle with whether large-scale, purpose-built ingestion qualifies as "research" and how to weigh market effects at training versus output time.
Looking ahead to 2026
We should see first-wave rulings in Canadian training cases, more U.S. decisions, and movement on Canadian legislative options. The big questions: Does training fit within existing fair dealing, do we get a TDM exception, and what conditions (consent, compensation, transparency) attach?
Practical plan: keep your audit current, require clear commitments from vendors, ship with output safeguards, and document decisions like you'll have to defend them. Because you might.
Helpful resources
- Government of Canada: Copyright and Generative AI - What we heard
- Justice Laws Website: Copyright Act
Optional training for legal and compliance teams
If your team needs a fast refresher on AI fundamentals to support policy, procurement, and vendor oversight, explore role-based options here: Complete AI Training - courses by job.
Your membership also unlocks: