Why AI image generators don't just copy: creativity from constraints

Diffusion models seem creative because local, shift-consistent denoising yields emergent patterns. A lean ELS model matches trained systems and guides fidelity-coherence balance.

Categorized in: AI News Science and Research
Published on: Sep 28, 2025
Why AI image generators don't just copy: creativity from constraints

Where diffusion models' creativity actually comes from

Image generators are trained to mimic data. Yet they produce novel, coherent images that weren't in the dataset. That gap between "copy" and "new" has looked mysterious for years. A new study argues it's not mystery at all-it's a direct side effect of how these systems denoise.

Local rules, global novelty

Diffusion models work by adding noise to images and then removing it step by step. Two practical constraints shape that process: locality and translational equivariance. The model predicts and corrects noise one small patch at a time (locality), and if you shift the input, the output shifts the same way (equivariance).

These constraints keep structure intact, but they also mean the model doesn't "see" the whole image when it makes each local decision. Like a biological system following local rules, global patterns emerge. That emergence is where the apparent creativity comes from.

A simple model that predicts creativity

Mason Kamb and Surya Ganguli built an analytic system-the equivariant local score (ELS) machine-that encodes only those two ingredients: locality and equivariance. No training, just equations. They then compared ELS outputs to trained diffusion models (including UNets and ResNets) on denoising tasks and found a striking match, reportedly around 90% on average.

Translation: much of what we call "creativity" in diffusion models is deterministically produced by their architecture and denoising dynamics. If you impose local patch prediction plus equivariance, you get novel configurations that weren't literal copies of the training set.

Why hands get extra fingers

Early AI art often produced extra fingers. That's a clean example of local decisions lacking global context. The model refined each patch to look like a finger, but without full-scene awareness, it overproduced local features. The result was anatomically coherent patches that didn't reconcile into a correct whole.

From denoising to Turing-like patterning

This behavior mirrors Turing patterns in morphogenesis: complex forms arise from simple local interactions without a master blueprint. In diffusion, the "score" guiding denoising acts like a digital analog of those local rules, assembling global structure from patch-level corrections.

Implications for research

If ELS can predict trained model behavior, we can forecast failure modes and steer outputs by changing local rules, patch sizes, or equivariance strength. That gives practitioners a handle to trade off fidelity, diversity, and coherence without blind hyperparameter sweeps. It also suggests that architectural constraints-not just data scale-drive generative novelty.

There's a neuroscience angle, too. Human creativity could share a similar structure: local assembly of familiar parts under consistent transformation rules, producing new combinations that feel original. Different substrate, similar principle.

Open questions

Large language models also appear creative, yet they don't rely on the same locality and equivariance as image diffusion. Do they express an analogous mechanism in sequence space, or something different? How far can ELS-style analytic models go before we need full training dynamics to explain novelty?

What to test next

  • Systematically vary patch size and stride during denoising. Measure effects on semantic coherence and artifact rates.
  • Relax or strengthen translational equivariance. Test how much global consistency depends on it.
  • Probe cross-model predictability. Can an ELS-style engine anticipate outputs from different backbones and schedulers?
  • Induce controlled failures (e.g., extra fingers) and verify whether ELS pre-predicts their location and frequency.
  • Compare with language and audio generators: define "locality" and "equivariance" analogs in 1D sequences and assess predictive power.

Why this matters for your lab

If creativity emerges from constraints, you can engineer it. Architect for the kind of novelty you want, then validate with an analytic predictor like ELS before expensive training runs. That shortens iteration cycles and turns "black box vibe" into testable design choices.

Further reading and upskilling

Bottom line: diffusion models don't sidestep their constraints to be creative-the constraints produce the creativity. Control the local rules, and you shape the outcomes.