AI now beats average human creativity - but the best humans still set the bar
A large multi-university study compared modern language models with humans on creativity tests. The headline: top models can outscore the average person on specific tasks, yet the most creative humans remain well ahead.
For creatives, researchers, and writers, that's the real story. Machines can help you produce more - and sometimes surprise you - but they still benefit from your taste, judgment, and lived context.
What the researchers tested
The team from Université de Montréal, Université Concordia, and the University of Toronto evaluated divergent thinking - the skill of generating many different ideas. Results were published in Scientific Reports, part of the Nature Portfolio.
The core test was the Divergent Association Task (DAT). You list ten words that are as unrelated as possible. A strong response looks like: "galaxy, fork, freedom, algae, harmonica, quantum, nostalgia, velvet, hurricane, photosynthesis." The score reflects how semantically distant the words are, using computational methods instead of subjective ratings.
They also analyzed short creative writing: haikus, movie plot summaries, and brief stories. The focus was on idea variety and unpredictability across sentences.
Scientific Reports (Nature) | Divergent Association Task
Key results
- Average-beating AI: GPT-4 scored higher than the full human sample on the DAT. Gemini performed similarly to humans. Other models varied.
- Top humans win: The most creative half of human participants scored higher than all models tested. The top 10% widened the gap further.
- Writing is harder: GPT-4 led other models in haikus, summaries, and stories, but humans still scored higher overall - especially when ideas had to connect across sentences.
How models approach creativity (and why it matters)
Models tend to reuse a narrow set of words. GPT-4 leaned on terms like "microscope" and "elephant." GPT-4-turbo used "ocean" in a large share of responses. Humans showed broader variety, with no single word dominating.
Models with weaker scores often ignored instructions or produced low-meaning lists. When not prompted for creativity, model scores fell sharply - evidence that good outcomes depend on clear guidance.
Tuning artificial creativity: settings and prompts
Two levers changed outcomes dramatically: temperature and prompt design.
- Temperature: Higher values increase variability and risk-taking. As temperature rose, GPT-4's DAT scores jumped. At the highest setting tested, it scored higher than about 72% of human participants. Word repetition dropped as variety expanded.
- Prompt design: Asking models to consider word origins and etymology boosted scores. Asking for "opposites" often hurt scores because antonyms are still closely related.
Practical playbook for creatives and researchers
- Idea sprints: Use high temperature for raw ideation (titles, motifs, angles, metaphors). Then lower temperature to refine.
- DAT-style warmups: Prompt the model: "List 10 words that are as unrelated as possible. Consider etymology and use rare concepts." Use the best words as seeds for headlines, scenes, or research questions.
- Prompt patterns that help: "Mix concepts from distant domains," "Avoid common nouns," "Explain why each idea is unusual," "Vary sentence rhythm."
- Prompt patterns that hurt: "Use opposites," "Be random," or vague asks with no constraints.
- Write long, edit human: For stories and summaries, let the model produce several takes. Keep the surprising parts. Rebuild structure and voice yourself.
- Measure, don't guess: Track variety (unique concepts), surprise (non-obvious links), and clarity (coherence) across drafts.
Where humans still lead
- Cross-sentence cohesion: Humans weave ideas over paragraphs with richer context and intent.
- Lived experience: Personal taste, constraints, and domain depth still separate great work from workable output.
- Originality over time: Humans avoid repetition more naturally and push past clichés with motivation and feedback loops.
What this means for your work
AI can match or beat average performance on narrow tests. With the right setup, it's a strong collaborator for generating options fast. But the ceiling still belongs to you.
Use models to explore breadth. Use your judgment to choose, connect, and polish. That's the combination that wins projects, papers, and pages.
Quick setup checklist
- Start with temperature 0.9-1.2 for ideation, 0.2-0.5 for editing and structure.
- State constraints: "Avoid common nouns. Favor rare, domain-distant terms. Explain your choices."
- Iterate: "Give 3 distinct approaches that disagree with each other."
- Score your drafts: count unique concepts, rate surprise, then tighten clarity.
Further learning
Bottom line: Treat AI as an idea engine and draft assistant. Keep ownership of taste, structure, and meaning. That's how you get more volume without losing your voice.
Your membership also unlocks: