Creative Image Generation, On Purpose: Steering Diffusion Models Into Rare Ideas
Most text-to-image tools repeat what they've seen. A new framework from Rutgers researchers Kunpeng Song and Ahmed Elgammal shows how to break that loop: define creativity as rarity in CLIP's embedding space, then actively aim the model at those low-probability pockets. No prompt acrobatics. No manual concept mashups. Just a principled way to get images that feel fresh while staying on-brief.
What that means in plain English
Think of CLIP's embedding space as a map of visual ideas. Common images cluster in crowded neighborhoods. Rare images live in the outskirts. This method computes where "rare" is on that map and nudges the diffusion process in that direction-while a "pullback" keeps outputs believable and aligned with your prompt.
The team builds this with a specialized loss function that rewards exploring unlikely embeddings and a constraint that prevents the model from drifting into nonsense. You also get directional control: how far to push from the typical result, and in which direction.
Why creatives should care
It short-circuits repetition. You can brief for distinct silhouettes, textures, and moods-without losing the core concept. Great for concept art, moodboards, editorial illustration, branding explorations, and product ideation where "seen it before" kills momentum.
- Write prompts that set intent: "a handbag that avoids common silhouettes; emphasis on negative space, asymmetry, unexpected materials."
- When tools expose a "creativity" or rarity control, push it incrementally and watch for drift. Keep the semantic core intact.
- Generate in batches, shortlist, then nudge directionally rather than starting from scratch each time.
- Evaluate novelty beyond FID-style scores. Use side-by-side references and a clear rubric for "new but on-brand."
How the researchers did it (useful if your team builds tools)
- Creativity = inverse probability in CLIP space. They estimate the probability distribution of embeddings for generated images and optimize for low-probability regions.
- A custom loss encourages rare embeddings; a pullback mechanism keeps outputs realistic and faithful to prompts.
- They used Kandinsky 2.1 (diffusion prior + UNet). Results: rare, visually striking images with controllable deviation, and efficient generation (they report samples like buildings and vehicles in about two minutes).
- Dimensionality reduction: PCA to 50 dims retained over 95% of variance, which simplifies finding rare regions.
- Gaussian fit: supported by the Gaussian behavior of diffusion model priors, making probability estimation tractable.
- Novelty was also framed through information theory, considering user exposure to generated images.
- They note the approach should extend beyond Kandinsky (for example, to faster pipelines like Hyper-SD).
Guardrails and trade-offs
Push too far into rare territory and you'll get incoherent outputs. The pullback constraint helps, but you still need taste-level curation. Keep a tight loop: set boundaries (brand, palette, function), steer rarity in measured steps, and prune aggressively.
Quick prompts and workflows to try
- "Brutalist cafe exterior, avoids common facade patterns; highlight negative space, odd window rhythms, concrete with embedded textiles."
- "Editorial portrait, high semantic fidelity to subject, uncommon lighting geometry and texture overlays that don't exist in typical studio setups."
- "Concept car, recognizable as a sedan, but with unfamiliar surface breaks and asymmetrical light signatures."
Learn more and go deeper
Read the research summary on arXiv: Creative Image Generation with Diffusion Model. If CLIP is new to you, this primer helps: OpenAI's CLIP overview.
If you want hands-on ways to apply this in your workflow, explore curated tools for generative art here: Generative art tools (Complete AI Training). Also see our Generative Art tag for techniques and examples, and our Research tag for related papers and analyses.
The takeaway
Creativity isn't a lucky accident here-it's a measurable target. By optimizing for rarity in CLIP space and keeping a firm grip on fidelity, this approach gives you a dial for "more original, still on-brief." That's the practical edge most teams actually need.
Your membership also unlocks: