Approximate Domain Unlearning: Safer, More Controllable Vision-Language Models
Vision-language models have become very good at recognizing concepts across styles-photos, illustrations, sketches. That strength can backfire. A model that sees an illustrated car on a billboard as a "car" could trigger the wrong response in a real-world system. The fix isn't more generalization. It's control.
A research team from Tokyo University of Science, with collaborators from AIST and the University of Oxford, presented a new approach called Approximate Domain Unlearning (ADU) at NeurIPS 2025. ADU teaches models to intentionally "forget" specified domains while keeping performance in the domains you care about. Example: keep high accuracy on real vehicles while suppressing recognition of illustrated ones.
Why this matters for applied research
Domain generalization has been the default goal for years. But safety-critical deployments need selective competence, not universal perception. ADU reframes control as a first-class objective: turn off what you don't want the model to see.
How ADU works
The hard part: domains overlap in feature space, so you can't neatly remove one without hurting the other. The team tackles this with two components.
- Domain Disentangling Loss: Encourages separability between domains in the embedding space and captures domain-specific appearance cues within each image.
- Instance-wise Prompt Generator: Adjusts prompts per image to suppress recognition of unwanted domains while keeping essential signals intact.
The result is controlled degradation of recognition accuracy for unwanted domains and preserved performance where it matters.
What this enables
ADU supports "policy by domain." You can maintain recognition for real-world objects but block their stylized or illustrated counterparts. That flexibility helps align AI behavior with operational risk requirements, not just benchmark scores.
Practical guidance for researchers and engineers
- Define domains explicitly: Label target vs. off-target domains up front, and decide how "stylized," "illustrated," and "synthetic" are operationally scoped.
- Measure the right metrics: Track false positives across domains, degradation on the target domain, and leakage where the model still fires on the forbidden domain.
- Probe the feature space: Visualize embeddings to confirm domain separation. Look for collisions around edge cases and mixed-media inputs.
- Stress test the boundary: Use adversarial or hybrid images (photo with illustrative overlays) to quantify where approximate unlearning breaks down.
- Set policy toggles: Treat domain forgetting like a safety control you can enable per deployment configuration.
Example applications
- Driver assistance and robotics: Ignore illustrated signage or ads while preserving detection of real objects.
- Retail and public spaces: Avoid false alerts from posters, mascots, and screens while tracking real inventory or people.
- Medical imaging: Suppress recognition of overlays, icons, or simulations while keeping clinical features intact.
Risk perspective
Modern models create new risks by being good at everything. ADU shifts risk management from "hope the model generalizes well" to "decide where the model is allowed to generalize." That mindset supports safer, scenario-specific deployments.
Who's behind it
The work is led by Associate Professor Go Irie (Tokyo University of Science), with contributions from Kodai Kawamura, Yuta Goto, Rintaro Yanagi (AIST), and Hirokatsu Kataoka (AIST and University of Oxford). The study was presented at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025).
Key takeaways
- Generalization is valuable, but uncontrolled cross-domain recognition can be risky.
- Approximate unlearning gives you selective forgetting at the domain level.
- Disentangled features plus instance-wise prompts provide a practical path to control.
- Treat domain forgetting as a configurable safety feature, not a one-off tweak.
Image caption: First proposal and realization of approximate domain unlearning-selective forgetting at the domain level for vision-language models. Image credit: Associate Professor Go Irie, Tokyo University of Science.
Source: Tokyo University of Science
Your membership also unlocks: