ChatGPT answers scientific true-or-false questions correctly only 60% better than random chance, study finds

ChatGPT answered scientific true-or-false questions correctly 80% of the time, but only 60% better than random chance after adjustment. It also gave conflicting answers to identical questions, correctly flagging false statements just 16% of the time.

Categorized in: AI News Science and Research
Published on: Mar 17, 2026
ChatGPT answers scientific true-or-false questions correctly only 60% better than random chance, study finds

ChatGPT scores 60% better than random guessing on scientific true-or-false questions

Researchers at Washington State University tested ChatGPT's ability to evaluate whether hypotheses from scientific papers were supported by research. The results: the AI answered correctly about 80% of the time, but when adjusted for random chance, it performed only 60% better than guessing.

The study examined 719 hypotheses from business journals published since 2021. Researchers repeated each query 10 times using identical prompts to test consistency.

Inconsistency emerged as a critical weakness

ChatGPT gave different answers to the same question across repeated prompts. In some cases, the AI answered "true" five times and "false" five times for an identical query.

Across all 10 identical prompts, ChatGPT consistently estimated only 73% of statements accurately. The AI struggled most with false statements, correctly identifying them just 16.4% of the time.

"We're not just talking about accuracy, we're talking about inconsistency, because if you ask the same question again and again, you come up with different answers," said Mesut Cicek, an associate professor at WSU's Carson College of Business and lead author of the study.

Fluent language masks weak reasoning

The findings highlight a gap between what these AI tools appear to do and what they actually do. ChatGPT can produce convincing, grammatically correct responses to complex questions-but it often reasons incorrectly while sounding authoritative.

"Current AI tools don't understand the world the way we do-they don't have a 'brain,'" Cicek said. "They just memorize, and they can give you some insight, but they don't understand what they're talking about."

Researchers tested both ChatGPT-3.5 (in 2024) and ChatGPT-4 mini (in 2025). Accuracy improved slightly between versions, but the pattern held: the AI performed only marginally better than chance when adjusted for random guessing.

What this means for your work

The study, published in the Rutgers Business Review, recommends that professionals verify AI results before relying on them for consequential decisions. This applies especially to tasks requiring nuance or complex reasoning.

Managers should train staff on what ChatGPT can and cannot do reliably. Treating AI outputs with skepticism is essential, particularly in scientific and research contexts where accuracy matters.

Cicek's team ran similar tests with other AI tools and found comparable results. Earlier research by the same group found that consumers were less likely to buy products marketed with an AI emphasis, suggesting skepticism extends beyond researchers.

"Always be skeptical," Cicek said. "I'm not against AI. I'm using it. But you need to be very careful."

For professionals in science and research, the takeaway is straightforward: use AI for Science & Research as a starting point, not a conclusion. Verify claims independently, especially when stakes are high.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)