AI vs 100,000 Humans: Who Wins the Creativity Contest?
Generative AI can now beat the average person on certain creativity tests. But the top 10% of human creators still win where it counts: richer, more nuanced work like poetry and storytelling.
That's the headline from a large-scale study led by Professor Karim Jerbi at the Université de Montréal, comparing leading AI models (including GPT-4, Claude, and Gemini) against more than 100,000 people. It's the biggest head-to-head we've seen on creativity so far.
What the Study Actually Tested
The core metric was the Divergent Association Task (DAT), a quick test of divergent thinking. Participants list 7-10 nouns that are as unrelated as possible. The more semantically distant the words, the higher the score.
The DAT correlates with other creativity measures in writing, idea generation, and problem solving. It's fast (2-4 minutes) and accessible. If you want to see how it works, you can try a version here: Divergent Association Task.
Results at a Glance
- Some large language models outperformed average human scores on the DAT.
- On practical writing tasks (haiku, plot summaries, short stories), AI sometimes beat the average human, but the best human creators consistently scored higher.
- There's a ceiling: the top 10% of humans are still well ahead of today's models on depth, originality, and voice.
Why Models Can Score High-and Still Feel Generic
LLMs are good at maximizing semantic distance. They've seen massive text corpora and can pick far-apart words on command. That spikes scores on tests like the DAT.
But creativity isn't only distance. It's coherence, taste, risk, subtext, and emotional truth over time. That's where top human work still separates from even the strongest model outputs.
Tuning AI Creativity: Temperature, Prompts, Constraints
AI creativity isn't fixed. Dialing up temperature makes outputs more varied and less predictable. Lower temperature = safer, more conventional. Higher temperature = more exploratory, with a trade-off in coherence.
Prompts matter. Instructions that ask the model to consider etymology, structure, or opposing frames tend to produce more surprising associations and better scores. For deeper prompt tactics, see Prompt Engineering.
Practical Workflows for Creative Pros and Researchers
- Define intent: novelty, coherence, or both? Set a scoring rubric before generating.
- Run a "DAT warm-up": ask the model for 10 unrelated nouns on your theme to expand the search space.
- Temperature sweep: sample at 0.2, 0.7, and 1.0; compare signal vs. noise; keep the best lines.
- Constraint prompts: fixed syllables (haiku), forced metaphors, or banned clichés to pressure-test originality.
- Idea sprints: generate 20 options in minutes, then spend real time editing two that have teeth.
- Human curation: select, merge, rewrite. Keep voice and narrative intent human-led.
- Evaluation loop: score drafts on novelty, coherence, emotional impact, and specificity. Iterate with focused prompts.
Where AI Helps Today
- Brainstorming themes, titles, taglines, and angle lists you wouldn't consider under time pressure.
- Metaphor generation and cross-domain references to break habitual thinking.
- Plot beats, outline variants, and constraint-driven drafts to overcome blank-page friction.
- Research assistance: quick scans of related ideas, edge cases, and counterexamples to widen perspective.
Where Humans Still Win
Voice that feels lived-in. Emotional risk. Subtext that evolves across paragraphs or scenes. The best creators weave specificity, contradiction, and personal history into work that still lands after the novelty wears off.
So, Will AI Replace Creators?
Unlikely. The study suggests a better frame: AI as a creative assistant. It can expand your option set and speed up exploration, but it still relies on human direction, taste, and final judgment.
If you create for a living, use AI to explore wider. Then say what only you can say.
Your membership also unlocks: