LLMs gauge well-being across 64 countries, but can't replace human data, PNAS study shows

A global study finds LLMs can spot broad well-being patterns, like links to income and health. Accuracy slips in low-data regions, so human surveys still matter.

Categorized in: AI News Science and Research
Published on: Dec 04, 2025
LLMs gauge well-being across 64 countries, but can't replace human data, PNAS study shows

AI and Global Well-Being: What LLMs Get Right-and Where They Fall Short

Published on 04 Dec 2025

Professor Nattavudh Powdthavee (Prof Nick), Professor of Economics and Associate Dean for Faculty Affairs, co-led an international study published in the Proceedings of the National Academy of Sciences (PNAS). The work tests whether large language models (LLMs) can predict well-being across diverse populations.

The team analyzed responses from 64,000 individuals across 64 countries. The dataset provides a rare, large-scale look at how AI tracks human well-being globally.

Key results

  • LLMs pick up broad correlates of well-being, including income and health.
  • Accuracy declines in underrepresented regions, reflecting biases tied to digital and economic inequalities.
  • Models lean on surface-level language cues more than deep context, which can misestimate well-being in unfamiliar settings.
  • Adding data from underrepresented regions improves predictions, though gaps remain.
  • LLMs are not a substitute for human-collected data; high-quality, representative human data is still essential.

Why this matters for researchers and policy teams

LLM inferences can help as an early signal, but they should not drive decisions without strong ground truth. If you work across countries or low-data environments, expect performance to vary by region and demographic exposure online.

Use LLM outputs as hypotheses, not endpoints. Pair them with validated surveys, stratified sampling, and careful replication across contexts.

Practical guidance for research design

  • Audit representativeness: check coverage across languages, regions, and socioeconomic groups.
  • Calibrate with ground truth: regularly benchmark LLM predictions against local survey data.
  • Probe failure modes: document where cues mislead the model (idioms, code-switching, sparse data).
  • Invest in local data pipelines: even small, high-quality samples can correct bias materially.

Read the study

Find the publication in Proceedings of the National Academy of Sciences (PNAS).

Skill up on AI methods

If you're building workflows that mix survey science with LLMs, explore focused training paths here: AI courses by job.

More updates will follow on NTU's growing impact in globally relevant AI and social science research.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide