Study: Top AI Models Fail at Sports Analysis
Researchers at the University of North Carolina at Chapel Hill and Northeastern University tested how well leading AI systems perform at analyzing professional sports. The findings are clear: they're terrible at it.
The study examined four core capabilities - perception, reasoning, simulation, and decision-making - using a dataset called SVI-bench. The researchers compiled 35,000 hours of basketball, soccer, and hockey footage, 15 million annotated plays, 15,000 hours of professional commentary, 23,000 post-game reports, and 103,000 statistical records.
Where AI Performs Best - and Still Fails
Perception was the models' strongest area. ChatGPT, Google's Gemini, and the open-source model Qwen correctly identified which player performed which action roughly 74 percent of the time. That's the benchmark where they performed best.
A professional announcer with a 74 percent accuracy rate would be fired.
Reasoning and Prediction Collapse
Performance dropped sharply when researchers asked the models to explain why plays happened the way they did. Success rates fell to around 40 percent on average.
In one test, researchers showed ChatGPT a Cody Martin three-pointer that bounced off the top of the backboard before going in - an unusual shot. The model said the unusual part was that it was "his first made three of the game." The actual unusual element was the trajectory.
Simulation tasks - predicting where a player would move based on their current trajectory - were worse. The best-performing model performed at roughly the level of random chance when asked to predict a player's next position. Accuracy dropped further when researchers asked the models to predict longer sequences of movement toward a goal.
Complex Analysis Nearly Impossible
When asked to perform the work of a human broadcaster - analyzing post-game statistics and trends to draw conclusions - the models achieved just 5 percent accuracy.
Lorenzo Torresani, a computer science researcher at Northeastern and study co-author, said the gap reveals a fundamental limitation: "AI cannot tell you why things happen, and it cannot tell you what's gonna happen next."
A competent sportscaster does more than describe what's visible. They explain why a play worked, anticipate what comes next, and identify which moments matter. The research shows AI handles description reasonably well but fails at everything else.
Implications Beyond Sports
The findings extend far beyond broadcasting. Torresani noted that the same limitation appears in any job where value comes from understanding causation, making predictions, determining significance, and recommending action - rather than simply describing what's visible.
For knowledge workers concerned about AI automation, the study offers a concrete example of where AI systems hit a wall. The gap isn't narrow. It's fundamental.
Learn more about AI capabilities and limitations through AI Research Courses and Generative AI and LLM Courses.
Your membership also unlocks: