Copyright vs. AI: From Scraping to Fair Dealing in Canadian Courts

AI and copyright in Canada: what counts as fair dealing, infringement, and authorship. Practical steps for counsel on datasets, outputs, platform risk, and contracts.

Categorized in: AI News Legal
Published on: Nov 26, 2025
Copyright vs. AI: From Scraping to Fair Dealing in Canadian Courts

Copyright Claims in the Era of AI: What Canadian Lawyers Need to Know

Clients are asking the same questions: Can models train on copyrighted works? Do AI outputs infringe? Who owns what when humans and machines co-create? This piece gives you a clear, practical view so you can advise with confidence and spot risk early.

The issues that matter

  • Training data: Is copying for model training a reproduction requiring permission, or can it fall within fair dealing?
  • Outputs: When do generated results cross into substantial similarity and infringement?
  • Authorship and ownership: What level of human skill and judgment is required for copyright to subsist in AI-assisted works?
  • Intermediary liability: Authorization, caching/hosting exceptions, and the notice-and-notice regime.
  • Contracts and terms: Dataset licences, website terms of use, and vendor indemnities are often the fastest path to leverage.
  • Moral rights and attribution: Risk rises with output that imitates identifiable style plus recognizable elements.

Canadian legal framework in short

Canada does not have a dedicated text-and-data mining exception. Training often turns on fair dealing and the classic factors: purpose, character, amount, alternatives, nature of the work, and effect of the dealing.

Two touchstones keep appearing in briefs: the Supreme Court's fair dealing analysis in CCH v. Law Society of Upper Canada, and the Copyright Act itself, especially on user rights and intermediaries (Copyright Act, R.S.C. 1985, c. C-42).

Training data claims: where disputes start

  • Plaintiff theories: Unauthorized reproduction during scraping and dataset creation; circumvention of technical measures; breach of website terms; passing off for outputs that trade on an artist's recognizable expression.
  • Defence positions: Fair dealing for research; non-substantial copying; transient/technical copies; implied licence from publicly accessible sources; strict proof of ownership and chain of title for each asserted work.

Evidence wins cases. Maintain a clean record of data sources, filters, deduplication, and removal tools. Plaintiffs who can tie specific files to a model's weights and show more than general "influence" have leverage.

Output liability: similarity and regurgitation

Style alone isn't protected. Expression is. The question is whether the output reproduces a substantial part of a protected work, qualitatively or quantitatively. Memorization or near-verbatim regurgitation is high risk; stylistic mimicry without protectable elements is lower, but still sensitive if marketing implies endorsement.

Set up tests. Prompt for edge cases and document outcomes. Where models tend to reproduce training text, deploy filtering, reference checks, and post-generation similarity scanning.

Authorship and AI-assisted works

Canadian law requires human skill and judgment for copyright to subsist. Purely machine-generated text or images are unlikely to attract protection for the user. Where a human crafts prompts, curates iterations, and edits meaningfully, there's a stronger case for human authorship in the final work.

For clients, clarify who contributes what. Use contribution logs and versioning to support ownership positions and assignment clauses.

Platform and intermediary exposure

Platforms face claims for authorization and secondary infringement, alongside contract and consumer protection theories. Technical exceptions for caching/hosting can help, but they don't cure active involvement in infringing acts.

Implement notice-and-notice processes, content filtering, opt-out honouring, and repeat-infringer policies. These are as much about litigation optics as they are about statutory defences.

Case trends to watch

  • Training-stage claims are testing whether wholesale copying for model development is compensable, excused, or something in between.
  • Output suits focus on substantial similarity, memorization, and false endorsement. Plaintiffs are pairing copyright with trademark, moral rights, and passing off.
  • Expect courts to demand granular proof: which files were copied, how they were used, and a clear link between a specific work and a specific output.

Practical playbook for counsel

  • For creators and rightsholders
    • Register key works and retain working files for proof of originality.
    • Use clear licence terms that restrict text-and-data mining where feasible; monitor dataset disclosures and model cards.
    • Collect evidence of outputs that mirror protected works; run controlled prompts and preserve logs.
  • For AI developers and platforms
    • Maintain dataset provenance, licences, and opt-out compliance; document deduplication and memorization tests.
    • Filter training sets (remove copyrighted and sensitive material where risk is high); implement output similarity and watermark checks.
    • Structure indemnities, caps, and carve-outs thoughtfully; keep audit trails for fair dealing positions.
  • For in-house teams
    • Adopt an AI use policy: approved tools, allowed inputs, disclosure rules, and review gates for public-facing content.
    • Update procurement terms: training-data disclosures, infringement warranties, takedown SLAs, and IP ownership of deliverables.
    • Prepare a response playbook: intake, preservation, assessment, and resolution of infringement notices.

Litigation tactics that move the needle

  • Plaintiffs: Lead with the cleanest exemplars. Prove access and copying with dataset exhibits, hash matches, and near-identical outputs. Add contract counts where site terms prohibit scraping.
  • Defendants: Attack substantial similarity early. Press for particularity on ownership and chain of title. Offer a credible compliance story-audits, filters, and removal mechanisms-backed by records.

What's next

Expect more clarity on fair dealing boundaries for machine learning, stronger contract-based controls around data access, and increasing court focus on empirical evidence over rhetoric. Keep briefs tight, technical appendices thorough, and client policies aligned with what you're arguing in court.

Quick checklist

  • Map data flows: source, licence, storage, filters, retention.
  • Log human contributions for AI-assisted works.
  • Test and document memorization risk and output similarity.
  • Tighten vendor terms and internal AI policies.
  • Preserve artefacts the moment a dispute is likely.

If your team needs structured upskilling on AI use cases and risk for legal roles, see this curated list: AI courses by job.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide