Why AI image generators don't just copy: creativity from constraints

Diffusion models seem creative because local, shift-consistent denoising yields emergent patterns. A lean ELS model matches trained systems and guides fidelity-coherence balance.

Categorized in: AI News Science and Research

Published on: Sep 28, 2025

Where diffusion models' creativity actually comes from

Image generators are trained to mimic data. Yet they produce novel, coherent images that weren't in the dataset. That gap between "copy" and "new" has looked mysterious for years. A new study argues it's not mystery at all-it's a direct side effect of how these systems denoise.

Local rules, global novelty

Diffusion models work by adding noise to images and then removing it step by step. Two practical constraints shape that process: locality and translational equivariance. The model predicts and corrects noise one small patch at a time (locality), and if you shift the input, the output shifts the same way (equivariance).

These constraints keep structure intact, but they also mean the model doesn't "see" the whole image when it makes each local decision. Like a biological system following local rules, global patterns emerge. That emergence is where the apparent creativity comes from.

A simple model that predicts creativity

Mason Kamb and Surya Ganguli built an analytic system-the equivariant local score (ELS) machine-that encodes only those two ingredients: locality and equivariance. No training, just equations. They then compared ELS outputs to trained diffusion models (including UNets and ResNets) on denoising tasks and found a striking match, reportedly around 90% on average.

Translation: much of what we call "creativity" in diffusion models is deterministically produced by their architecture and denoising dynamics. If you impose local patch prediction plus equivariance, you get novel configurations that weren't literal copies of the training set.

Why hands get extra fingers

Early AI art often produced extra fingers. That's a clean example of local decisions lacking global context. The model refined each patch to look like a finger, but without full-scene awareness, it overproduced local features. The result was anatomically coherent patches that didn't reconcile into a correct whole.

From denoising to Turing-like patterning

This behavior mirrors Turing patterns in morphogenesis: complex forms arise from simple local interactions without a master blueprint. In diffusion, the "score" guiding denoising acts like a digital analog of those local rules, assembling global structure from patch-level corrections.

Implications for research

If ELS can predict trained model behavior, we can forecast failure modes and steer outputs by changing local rules, patch sizes, or equivariance strength. That gives practitioners a handle to trade off fidelity, diversity, and coherence without blind hyperparameter sweeps. It also suggests that architectural constraints-not just data scale-drive generative novelty.

There's a neuroscience angle, too. Human creativity could share a similar structure: local assembly of familiar parts under consistent transformation rules, producing new combinations that feel original. Different substrate, similar principle.

Open questions

Large language models also appear creative, yet they don't rely on the same locality and equivariance as image diffusion. Do they express an analogous mechanism in sequence space, or something different? How far can ELS-style analytic models go before we need full training dynamics to explain novelty?

What to test next

Systematically vary patch size and stride during denoising. Measure effects on semantic coherence and artifact rates.
Relax or strengthen translational equivariance. Test how much global consistency depends on it.
Probe cross-model predictability. Can an ELS-style engine anticipate outputs from different backbones and schedulers?
Induce controlled failures (e.g., extra fingers) and verify whether ELS pre-predicts their location and frequency.
Compare with language and audio generators: define "locality" and "equivariance" analogs in 1D sequences and assess predictive power.

Why this matters for your lab

If creativity emerges from constraints, you can engineer it. Architect for the kind of novelty you want, then validate with an analytic predictor like ELS before expensive training runs. That shortens iteration cycles and turns "black box vibe" into testable design choices.

Why AI image generators don't just copy: creativity from constraints

Where diffusion models' creativity actually comes from

Local rules, global novelty

A simple model that predicts creativity

Why hands get extra fingers

From denoising to Turing-like patterning

Implications for research

Open questions

What to test next

Why this matters for your lab

Further reading and upskilling

Related AI News for Science and Research

DoD Backs University of Oklahoma AI-Driven Discovery of Switchable Materials for Neuromorphic, Energy-Efficient Computing

How AI Slipped Into Peer Review: Faster Publishing, Murky Transparency, Untapped Rigor

From Busywork to Breakthroughs: Building Reliable Scientific AI Agents with NeMo Gym and NeMo RL

AI tips off scientists to a new monkeypox weak spot, opening the door to simpler vaccines and antibody therapies

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: