UNC study finds AI struggles with complex characters

UNC researchers analyzing thousands of stories found AI characters lack human mystery. A framework measuring eight dimensions shows larger models lack depth.

Categorized in: AI News Science and Research
Published on: Jul 02, 2026
UNC study finds AI struggles with complex characters

Researchers at the University of North Carolina at Chapel Hill have found that AI-generated fiction characters consistently lack the mystery and ambiguity that make human-written stories memorable. The study, which examined thousands of stories using a new automated framework called CASPER, revealed that even the largest AI models default to tidy resolutions and familiar archetypes rather than the unresolved, contradictory traits that give literary characters depth.

"We found that AI models tend to 'play it safe' with their characters, in the sense that they wrap up storylines neatly," said Anneliese Brei, a graduate student in computer science at UNC-Chapel Hill and lead author of the study. "Human writers, on the other hand, are sometimes more willing to leave questions unanswered and let characters remain mysterious. That difference matters because ambiguity is often what makes a story linger with a reader."

How CASPER measures character complexity

The research team drew on literary theory to identify eight dimensions of character portrayal, then built CASPER - an automated evaluation framework - to measure those traits systematically across thousands of stories. The dimensions included whether characters appear realistic or exaggerated, whether they evolve during the narrative, and the degree to which they remain open to interpretation by the story's end.

This approach gives researchers a way to quantify character depth that had not previously been applied to AI-generated fiction at scale. The framework can serve as a benchmark for tracking whether newer systems genuinely improve at portraying complex characters, rather than simply becoming more fluent at generating prose.

Bigger models don't mean better characters

One finding caught the researchers off guard. "One of our most surprising findings was that bigger and more powerful AI models don't necessarily create more varied characters than smaller ones," said Nicholas Sanaie, an undergraduate student in computer science at Carolina and co-author of the study. "That tells us the challenge isn't just about scale. It's about how these models understand storytelling itself."

The consistency of this pattern across model sizes suggests that scaling up existing architectures won't automatically solve the problem. The limitation appears rooted in how these systems process narrative, not in a shortage of parameters or training data.

The study arrives as AI for Writers tools such as Sudowrite and Squibler gain adoption among fiction authors, and as film and television productions increasingly use AI for script outlines and dialogue generation. Surveys indicate that a substantial portion of fiction writers now incorporate AI into some stage of their creative workflow.

Why human ambiguity still matters

Snigdha Chaturvedi, associate professor of computer science at UNC-Chapel Hill and senior author of the study, framed CASPER as a diagnostic tool for developers. "As more people collaborate with AI to write novels, screenplays and other creative works, we need ways to understand both what these systems do well and where they fall short," she said. "CASPER gives us a lens for evaluating character depth and diversity, which can ultimately help developers build storytelling systems that better reflect the complexity of human experience."

For AI for Creatives, the practical takeaway is clear: current tools can generate competent narrative structures, but the qualities that make characters resonate - contradiction, unresolved tension, the refusal of easy categorization - still require a human writer's judgment.

Why this matters for science and research professionals

The study signals a shift in how researchers evaluate generative AI. Rather than measuring fluency or coherence alone, the CASPER framework isolates a specific, theory-grounded dimension of output quality - character complexity - and tests it systematically. For researchers working on creative AI systems, this points toward a broader need: benchmarks that capture subtle, qualitative aspects of output rather than aggregate metrics that reward surface-level polish. The finding that model scale does not correlate with character variety also reinforces a practical research insight. Architectural improvements and training strategies matter more than parameter count when the goal is producing output with genuine depth.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)