AI Now Outperforms Average Humans on Creativity Tests, But Top Creators Still Lead
Artificial intelligence can beat average humans on specific creative tasks, according to a large-scale study published in Scientific Reports. Researchers from universities in Montreal, Toronto, and Concordia tested modern language models including GPT-4, Claude, and Gemini against data from over 100,000 human participants. The most creative people still outpace all machines tested.
The findings matter for anyone working in creative fields. They show where AI tools can genuinely compete with human performance - and where they fall short.
How Researchers Measured Creativity
The team focused on divergent thinking, which involves generating many different ideas rather than finding one correct answer. Both humans and AI systems took the same tests, making direct comparison possible.
The primary tool was the Divergent Association Task (DAT). Participants produced ten words as unrelated to each other as possible. A strong response might include words like "galaxy, fork, freedom, algae, harmonica, quantum, nostalgia, velvet, hurricane, photosynthesis."
Scoring used computer methods that measure how far apart word meanings are, eliminating subjective judgment. The task takes only a few minutes.
Researchers also tested creative writing. Humans and AI models generated haikus, movie plot summaries, and short fictional stories. Analyses measured how many different ideas were combined and how unpredictable the writing was.
When AI Exceeded Human Average
GPT-4 achieved a higher average score than the full human sample on the DAT. GeminiPro performed at levels statistically similar to humans overall. Other models scored lower.
But the picture shifts when comparing top performers. The most creative half of human participants scored higher than all AI models tested. The top 10 percent of human scorers widened that gap significantly.
"Even the best AI systems still fall short of the levels reached by the most creative humans," said Karim Jerbi, who led the research at Université de Montréal.
How Machines Think Differently
The study revealed distinct differences in how AI and humans approach creative tasks. Language models relied on narrow word sets. GPT-4 frequently repeated terms like "microscope" and "elephant." Humans showed far more variety, with no single word appearing in more than a small fraction of answers.
Models with lower creativity scores were more likely to ignore instructions or generate less meaningful lists. When asked to write generic word lists without creative prompts, their scores dropped sharply. This confirmed that high scores reflected deliberate performance, not random output.
Adjusting AI Creativity
One critical finding: AI creativity can be easily tuned. Researchers adjusted a setting called temperature, which controls how predictable responses are. Higher temperature values encourage riskier, more varied output.
As temperature increased, GPT-4's creativity scores rose sharply. At the highest setting tested, the model scored higher than about 72 percent of human participants. Word repetition declined as the model explored broader vocabulary.
Prompt design mattered equally. When researchers instructed models to focus on word origins and etymology, creativity scores increased further. Other strategies, such as asking models to use opposites, reduced creativity because opposing words often remain closely related in meaning.
These results show that AI creativity depends heavily on human configuration and guidance.
Writing Tells a Different Story
Strong DAT performance did not always translate to superior creative writing. GPT-4 outperformed other models on haikus, movie summaries, and short stories. Human writers still scored higher overall, especially on tasks requiring ideas woven across sentences.
Temperature settings boosted creativity for longer texts but had little effect on haikus. Visual analyses showed that human and machine writing occupied different regions of meaning, suggesting that similar scores can mask deep differences in how ideas form.
What This Means for Your Work
The findings challenge claims that AI is replacing human creativity. Machines can now rival or exceed average human performance on narrow tasks. They lack the depth, lived experience, and flexible thinking of highly creative people.
The practical takeaway: generative AI works best as a tool in service of human creativity, not as a replacement. Understanding how these systems work - their strengths on specific tasks and their limitations - helps creatives use them effectively.
For professionals in creative fields, AI for Creatives training can clarify how these tools fit into your workflow and where human creativity remains essential.
Your membership also unlocks: