Teaching Vision-Language Models What to Forget: Approximate Domain Unlearning for Safer, Controllable AI

ADU lets vision-language models forget risky domains while keeping what you need. Think: keep real cars, mute illustrated ones-safety by control, not broad generalization.

Categorized in: AI News Science and Research
Published on: Dec 15, 2025
Teaching Vision-Language Models What to Forget: Approximate Domain Unlearning for Safer, Controllable AI

Approximate Domain Unlearning: Safer, More Controllable Vision-Language Models

Vision-language models have become very good at recognizing concepts across styles-photos, illustrations, sketches. That strength can backfire. A model that sees an illustrated car on a billboard as a "car" could trigger the wrong response in a real-world system. The fix isn't more generalization. It's control.

A research team from Tokyo University of Science, with collaborators from AIST and the University of Oxford, presented a new approach called Approximate Domain Unlearning (ADU) at NeurIPS 2025. ADU teaches models to intentionally "forget" specified domains while keeping performance in the domains you care about. Example: keep high accuracy on real vehicles while suppressing recognition of illustrated ones.

Why this matters for applied research

Domain generalization has been the default goal for years. But safety-critical deployments need selective competence, not universal perception. ADU reframes control as a first-class objective: turn off what you don't want the model to see.

How ADU works

The hard part: domains overlap in feature space, so you can't neatly remove one without hurting the other. The team tackles this with two components.

  • Domain Disentangling Loss: Encourages separability between domains in the embedding space and captures domain-specific appearance cues within each image.
  • Instance-wise Prompt Generator: Adjusts prompts per image to suppress recognition of unwanted domains while keeping essential signals intact.

The result is controlled degradation of recognition accuracy for unwanted domains and preserved performance where it matters.

What this enables

ADU supports "policy by domain." You can maintain recognition for real-world objects but block their stylized or illustrated counterparts. That flexibility helps align AI behavior with operational risk requirements, not just benchmark scores.

Practical guidance for researchers and engineers

  • Define domains explicitly: Label target vs. off-target domains up front, and decide how "stylized," "illustrated," and "synthetic" are operationally scoped.
  • Measure the right metrics: Track false positives across domains, degradation on the target domain, and leakage where the model still fires on the forbidden domain.
  • Probe the feature space: Visualize embeddings to confirm domain separation. Look for collisions around edge cases and mixed-media inputs.
  • Stress test the boundary: Use adversarial or hybrid images (photo with illustrative overlays) to quantify where approximate unlearning breaks down.
  • Set policy toggles: Treat domain forgetting like a safety control you can enable per deployment configuration.

Example applications

  • Driver assistance and robotics: Ignore illustrated signage or ads while preserving detection of real objects.
  • Retail and public spaces: Avoid false alerts from posters, mascots, and screens while tracking real inventory or people.
  • Medical imaging: Suppress recognition of overlays, icons, or simulations while keeping clinical features intact.

Risk perspective

Modern models create new risks by being good at everything. ADU shifts risk management from "hope the model generalizes well" to "decide where the model is allowed to generalize." That mindset supports safer, scenario-specific deployments.

Who's behind it

The work is led by Associate Professor Go Irie (Tokyo University of Science), with contributions from Kodai Kawamura, Yuta Goto, Rintaro Yanagi (AIST), and Hirokatsu Kataoka (AIST and University of Oxford). The study was presented at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025).

Key takeaways

  • Generalization is valuable, but uncontrolled cross-domain recognition can be risky.
  • Approximate unlearning gives you selective forgetting at the domain level.
  • Disentangled features plus instance-wise prompts provide a practical path to control.
  • Treat domain forgetting as a configurable safety feature, not a one-off tweak.

Image caption: First proposal and realization of approximate domain unlearning-selective forgetting at the domain level for vision-language models. Image credit: Associate Professor Go Irie, Tokyo University of Science.

Source: Tokyo University of Science


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide