AI Simulations Offer New Tools and Challenges for Social Science Research

AI models like GPT-4 can simulate human responses to aid social science research, reducing cost and time. Yet, they face limits in bias, variation, and generalization, so human data remains essential.

Categorized in: AI News Science and Research

Published on: Jul 26, 2025

Social Science Moves In Silico

Social science research plays a critical role in shaping effective marketing, responsive policies, and strategies for public health and safety. It covers economics, psychology, sociology, and political science, employing methods ranging from fieldwork to randomized trials. But the challenge lies in studying people—complex, unpredictable subjects who resist easy experimentation.

Jacy Anthis, a visiting scholar at Stanford’s Institute for Human-Centered AI and a PhD candidate, points out that unlike controlled lab subjects, humans are difficult to experiment on over long periods. This leads to costly, time-consuming, and often hard-to-replicate studies.

Advances in AI, particularly large language models (LLMs), offer a new approach: simulating human data. These models can roleplay diverse human subjects or expert social scientists, enabling researchers to test assumptions, run pilot studies, and estimate sample sizes at a fraction of the cost.

“These models are remarkably similar to people and give us an opportunity to add them into any part of the social science research pipeline,” says Anthis.

However, LLMs have limitations. They tend to produce less varied, sometimes biased, or overly agreeable responses and struggle to generalize to new contexts. Still, initial methods show promise, and with further work, these tools could keep pace with societal and technological changes.

Evaluating AI as a Human Proxy

Assessing how well AI mimics human behavior is crucial. Luke Hewitt and colleagues at Stanford tested GPT-4’s ability to replicate results from 476 previously conducted randomized controlled trials (RCTs). These trials typically involve exposing participants to a treatment—like reading a text or watching a video—and measuring attitude or behavior changes compared to a control group.

The team found GPT-4’s simulated responses correlated strongly (0.85) with actual treatment effects, matching expert human predictions. Notably, the model performed well even on studies published after its training data cutoff.

“Many expected the model to fail on new experiments it had not seen before, but it made fairly accurate predictions,” Hewitt notes.

Newer models, with web search capabilities and more recent training data, are harder to evaluate. Creating archives of unpublished studies might be necessary to properly validate them.

AI Is Narrow-Minded

Despite accuracy in some areas, LLMs struggle with distributional alignment—the ability to reproduce the full range of human responses. For example, in a “pick a number” task, models often select a narrower and more predictable range than humans.

Nicole Meister, a Stanford graduate student, explains that standard measures like “log probability” distributions don’t capture human-like variation well. Asking the model to simulate multiple individuals or verbalize distributions yields better results.

Meister’s team found that providing LLMs with example distributions from related questions, an approach called “few-shot” steering, improved alignment with human responses—especially for opinion-based questions. However, this method is less effective for preferences, which are less predictable.

“LLMs can misportray and flatten a lot of groups,” Meister warns. This calls into question the use of LLMs for predicting product preferences.

Other Challenges: Validation, Bias, Sycophancy, and More

LLMs pose risks if used improperly in social science research. Hewitt emphasizes the need for clear validation to know when model predictions can be trusted. Without quantifying uncertainty, users may either overtrust or dismiss model outputs.

Anthis highlights additional challenges:

Bias: Models often reinforce racial, ethnic, and gender stereotypes.
Sycophancy: Assistant-style models tend to give agreeable answers, sometimes at the expense of accuracy.
Alienness: Answers may sound human but can be logically inconsistent or bizarre, such as incorrect math solutions.
Generalization: LLMs struggle to extend findings beyond their training data, limiting studies on new populations or large group behaviors.

While bias and sycophancy can be mitigated using techniques like roleplaying experts or fine-tuning, addressing alienness and generalization requires a deeper theoretical understanding of these models.

Current Best Practice? A Hybrid Approach

Despite shortcomings, LLMs are valuable when paired with human data. Stanford sociology student David Broska advocates a mixed-subjects design that combines human responses with LLM predictions. This “prediction-powered inference” method enhances confidence in results while minimizing bias introduced by models.

Running a small pilot study with both humans and an LLM helps estimate the optimal mix for statistically significant outcomes, potentially reducing costs without sacrificing reliability.

“At the end of the day, if you’re studying human behavior, your experiment needs to ground out in human data,” Broska stresses.

Hewitt agrees, noting that while LLM simulations can guide study design—such as selecting survey wording or experimental conditions—human subjects remain essential for grounding findings in reality.

For those interested in applying AI tools responsibly in research, exploring specialized training courses can provide practical skills and frameworks. Visit Complete AI Training to find relevant resources.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

AI Simulations Offer New Tools and Challenges for Social Science Research

Social Science Moves In Silico

Evaluating AI as a Human Proxy

AI Is Narrow-Minded

Other Challenges: Validation, Bias, Sycophancy, and More

Current Best Practice? A Hybrid Approach

Related AI News for Science and Research

JRC Science at the Heart of Europe's AI Policy

U of A and UCSF forge AI partnership to fast-track treatments for neurological and infectious diseases

Don't Let AI Decide What Fair Means in Hiring

DeepMind and UK Government Broaden AI Pact to Speed Materials Discovery, Improve Classrooms, and Streamline Services

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: