In a 100,000-Person Creativity Test, AI Tops Average Humans-Not the Best

AI now edges past average creativity tests, but the top humans still win by a mile. Great for generating options with tuning, yet no substitute for truly original talent.

Published on: Jan 22, 2026
In a 100,000-Person Creativity Test, AI Tops Average Humans-Not the Best

AI Can Beat Average Creativity - But Not the Best

Credit: paulista/Shutterstock

AI can now edge out the average person on some creativity tests. But the most creative humans still sit in a league AI can't reach.

That's the takeaway from a large study comparing 100,000 people with nine leading AI models. Helpful for your workflow? Yes. A replacement for top creative talent? No.

In a Nutshell

  • AI creativity can be tuned, but it has limits.
  • Raising randomness (temperature) reduces repetition and improves scores.
  • GPT-4 outperformed typical humans on a word-diversity test; Google's GeminiPro matched them.
  • The top 10% of humans still beat every AI tested by a clear margin.
  • AI tends to repeat "safe" words; humans naturally vary their responses.

The 100,000-Person Experiment

Participants from the U.S., U.K., Canada, Australia, and New Zealand took a simple test: list 10 words as different from each other as possible. Models received the same prompt and produced 500 responses each.

GPT-4 topped the average human score on this task. GeminiPro matched average human performance. The human elite-the top 10%-outperformed every model.

The task aligns with the Divergent Association Task, a well-studied measure of semantic distance. If you want to try a version of it, explore the DAT project site here.

The Repetition Problem Nobody Expected

Despite higher scores overall, models leaned on the same high-probability words. GPT-4 used "microscope" in 70% of outputs and "elephant" in 60%. GPT-4 Turbo dropped "ocean" into more than 90% of answers.

Humans barely repeated anything: "car" showed up 1.4% of the time, "dog" 1.2%, "tree" 1.0%. People naturally diversify. Models don't-unless you tune them.

Creativity Is a Setting, Not a Soul

By increasing temperature (the randomness setting), researchers cut repetition and lifted GPT-4's scores above 72% of human participants. Useful insight: you can "turn up" creative variety, but you're still sampling from patterns, not inventing from lived context.

For creatives, that means AI is helpful for ideation and breadth. The spark that makes ideas feel genuinely new still comes from you.

Newer and Bigger Aren't Automatically Better

GPT-4 Turbo performed worse on this creativity test than the original GPT-4. The likely reason: optimization for speed and cost can trade off against creative diversity.

Also notable: Vicuna, a smaller open-source model, beat several larger commercial models. Size and recency are poor proxies for originality.

Practical Playbook for Creatives and Research Teams

  • Increase temperature to 1.0-1.5 for ideation. Expect more variety and fewer repeats.
  • Add anti-repetition constraints: "Avoid repeating nouns used earlier. Penalize common high-frequency words."
  • Sample in batches. Generate 5-10 options, then curate. The win comes from selection, not a single pass.
  • Seed with weirdness. Provide uncommon starter words or perspectives to break the "safe-word" loop.
  • Use structure prompts: "Give 3 concepts that don't share a domain, then combine 2 in a surprising way."
  • Treat AI as a breadth generator. You provide taste, context, and the final cut.

Copy-and-Paste Prompt to Reduce Safe Outputs

Instruction: "List 12 words with maximal semantic distance. Avoid common nouns like car, tree, dog, ocean, elephant, microscope. Do not repeat word categories. Include at most 2 living things and 2 artifacts."

Settings: temperature 1.2-1.4; top-p 0.9-1.0; 3-5 samples. Keep the best, discard the rest.

For writing: "Pitch 10 story premises spanning distinct domains (science, folklore, finance, micro-cultures). No premise may share setting, era, or conflict type with another."

What This Means for Your Work

If you're a top-tier writer, designer, or researcher, your edge is safe. Companies pay for originality at the high end, and current models don't meet that bar.

For teams, the strategy is simple: use AI to expand the option set, then rely on human judgment for the picks that matter. Quality still depends on your taste and decision-making.

Limitations to Keep in Mind

  • We don't know the full training data for several models, so prior exposure to the test can't be ruled out.
  • The test focuses on semantic distance. Creativity is broader-style, constraints, goals, and context also matter.
  • Some commercial model details weren't available, limiting feature-level conclusions.

Publication Details and Disclosures

The study, "Divergent Creativity in Humans and Large Language Models," was published in Scientific Reports on January 21, 2026. Human data collection was approved by relevant ethics boards.

Funding sources included Canadian research councils and fellowships. One researcher is affiliated with Google DeepMind but conducted the work independently. Full disclosures are listed in the paper.

If You Want to Get Better Outputs from AI

Learn prompt patterns that force variety, set constraints that fight repetition, and build a sampling-and-curation workflow. If you need a quick starting point, see our prompt engineering resources here.

Credit: Nicoleta Ionescu/Shutterstock


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide