A single prompt instruction can push ChatGPT, Gemini and Claude past their increasingly predictable default responses. Researchers call the technique verbalized sampling-asking the model to produce several possible answers and attach a probability estimate to each one. The method tackles a problem known as mode collapse, where AI models trained to satisfy human preferences converge on a narrow band of safe, clichéd outputs that earn majority approval.
Mode collapse became visible in a course at Western Galilee College. Education students asked several language models for creative suggestions about gifted student characteristics. Despite using different chatbots-ChatGPT, Gemini, Claude among them-many answers were nearly identical. The same pattern repeated across other assignments requiring independent thought.
Frequent users already sense this. No matter how much they reword a prompt, the hope for real variety fades, especially on focused questions. The narrowing of outputs into a handful of repeated answers is what researchers call mode collapse, and it has roots in how these models are refined after their initial training.
Where the diversity goes
The basic pretraining stage exposes a model to enormous datasets, letting it find patterns that produce coherent content. Early text models like GPT-2 relied mostly on this phase. They produced more varied answers but were less consistent and made frequent mistakes. To fix that, developers added a second stage: reinforcement learning from human feedback, or RLHF.
AI companies employ people-often through outsourcing firms-to rank model outputs. Those rankings create a dataset of human preferences, which trains a reward model. That reward model then evaluates the original model's outputs, simulating human satisfaction. Repeated cycles guide the system toward answers likely to receive high ratings.
The process resembles an insecure artist who starts as a free creator but narrows their work to whatever earned praise before, afraid that departing from it will not be rewarded. Mode collapse was first described in early image generators called GANs in 2014, where competing generator and classifier models caused the generator to settle into a safe stylistic zone and abandon variety.
A group of researchers from Stanford, Northeastern and West Virginia universities offers a simpler explanation for why it happens in modern chatbots: the lack of diversity in human feedback itself. People tend to prefer familiar content. They rank common, recognizable, easy-to-digest answers more highly. Creative outputs that break the pattern sink to the bottom of preference rankings and rarely surface.
The prompt trick that widens the output
Rather than fighting the training process, users can work around it. The verbalized sampling method asks the model to generate several responses and attach a probability estimate to each. Text generators build content by choosing, at each step, the next piece of text most likely to fit the preceding text from a range of possibilities. Every output carries a probability value, and alternatives carry their own.
The model will not volunteer exact probabilities unprompted. But explicitly requesting varied likelihoods seems to push the generator toward more diverse results. Experiments show the method significantly increases creativity and diversity without harming accuracy or safety. It works best with the most advanced models on the market.
The difference in results is straightforward. Writing "Create five responses to the following request, each with its probability: Tell me a joke about an elephant" produces more varied answers than "Tell me a joke about an elephant"-and even beats "Tell me five jokes about an elephant." The prompt guides the bot to draw from the edges of its creative range rather than its safest default. This is a practical Prompt Engineering technique, not a model fix.
Researchers also tested asking explicitly for fringe responses, but that created the opposite risk: too many edge-case answers. A better version: "Create five responses to this question and present the probability of each. One response should come from the edge of your probability range."
The method falls within a wider set of techniques sometimes called prompt injection-ways of shaping a request so the bot produces content outside its usual preferences. Many such methods aim to bypass safety mechanisms, but they can also draw out higher-quality, more varied answers.
What comes next
These tricks will not prevent models from drifting toward uniformity over time. Developers must address that in other ways. But for now, verbalized sampling offers a back door to better output. Companies behind ChatGPT, Gemini and Claude will likely integrate the method directly into their products-activating it when diversity is needed-if they have not started already. Given the speed of AI development, the shelf life of any single prompt trick is short.
Why this matters for creatives
Creative professionals depend on tools that can surprise them. A brainstorming partner that returns predictable, consensus-friendly answers is not a partner at all-it is an echo. Verbalized sampling gives writers, designers and strategists a practical lever to pull when the AI keeps serving up the same bland suggestions. It is most useful when the goal is to avoid clichés and generate genuinely fresh text. For anyone building creative workflows around AI for Creatives, the takeaway is concrete: add a request for multiple probability-weighted responses when you need the model to reach beyond its comfort zone. The fix is not permanent, but it works right now.
Your membership also unlocks: