AI now matches average human creativity-on specific tests
Can generative AI be creative? A new study in Scientific Reports suggests large language models (e.g., GPT-4) now meet or beat the average human on standardized creativity exercises. The top human creators still lead by a clear margin, but the middle of the distribution is getting crowded.
For teams in science and research, this isn't a novelty-it's a workflow shift. AI can reliably contribute "good-enough" novel ideas on demand, while your best people remain the source of standout originality.
How creativity was measured
The researchers used the Divergent Association Test (DAT), which asks for ten words whose meanings are as far apart as possible. This taps mental flexibility and the ability to form unlikely links-not just vocabulary breadth. Because it's quick, it supports large-scale comparison; the AI outputs were benchmarked against a dataset of 100,000 human participants.
They then extended the test to richer tasks like writing haikus and sketching film premises. Same pattern: AI often outperforms the average participant, but the most original humans-especially the top 10%-produce ideas AI doesn't match.
Key results at a glance
- On the DAT, GPT-4 scored above the standard human average.
- On haikus and scenario ideation, AI frequently matched or exceeded typical outputs.
- Elite human creators still produced the most original ideas by a wide margin.
- AI performance was sensitive to settings (e.g., temperature) and prompt design.
What this means for research workflows
AI is now a dependable "first-pass inventor" for ideation-heavy tasks: hypothesis variants, naming studies, generating feature sets, outlining experiments, or proposing alternative mechanisms. Treat it like a collaborator you direct-not an autonomous creator.
- Use AI to expand search breadth quickly, then apply expert filters for feasibility and novelty.
- Reserve human deep work for high-impact synthesis, risk assessment, and conceptual leaps.
- Document prompt protocols so your team can reproduce high-quality idea generation.
Practical prompting to increase originality
- Adjust temperature upward to encourage less predictable outputs; lower it for focus and coherence.
- Prime for uncommon links: "Consider etymology, cross-domain analogies, and negated assumptions."
- Constrain formats: "Generate 10 ideas, each under 12 words, no shared nouns." Constraints force diversity.
- Stage the process: breadth first, then refinement. Ask for 50 quick seeds β cluster β elaborate the top 3.
- Use self-critique: "List reasons each idea may be trivial or derivative; improve the top 5."
How to evaluate outputs beyond vibe checks
- Word-distance metrics (as in the DAT) for a quick proxy of novelty.
- Inter-rater scoring with clear rubrics: novelty, usefulness, and clarity.
- De-duplication against prior lab outputs, literature, and known solution spaces.
- Pilot testing: small, low-cost trials to validate signal before deeper investment.
Limits and open questions
- Standardized tasks don't capture the full arc of scientific creativity: problem framing, experimental cunning, and taste.
- Higher temperature can add noise, clichΓ©, or contradictions-human judgment remains essential.
- Models can echo training data; provenance and attribution need oversight.
- The best people still matter most. AI amplifies a strong team; it doesn't replace one.
Where to learn more
- Scientific Reports (journal hosting the study)
- Divergent Association Test (PNAS)
If you want to level up your team's prompting
Bottom line: AI can now generate solid, diverse ideas on command. Use it to widen the option set fast, then let expert judgment and domain knowledge decide what actually moves the work forward.
Your membership also unlocks: