Stanford study finds AI tutors give unequal feedback based on student race, gender and ability

Stanford researchers found AI tutors give more rigorous, development-focused feedback to students perceived as White or high-achieving, while others receive grammar corrections and hollow praise. The bias persisted even when prompts were identical.

Categorized in: AI News Education
Published on: Apr 01, 2026
Stanford study finds AI tutors give unequal feedback based on student race, gender and ability

Stanford study finds AI tutors give unequal feedback based on race, gender, and ability

Stanford University researchers have identified consistent bias in how AI tutors respond to students, with large language models producing different feedback depending on perceived identity, language background, and academic ability.

The findings, published as a preprint on arXiv, show that feedback varies in depth, focus, and instructional intent across student profiles-even when the underlying prompts remain identical.

High-achieving and White students receive more rigorous feedback

Students identified as high-achieving or White were more likely to receive detailed, development-focused feedback. This included critique of argumentation, suggestions for improving reasoning, and prompts to extend ideas.

Responses associated with Hispanic students or English language learners, by contrast, emphasized grammar, spelling, and formality. While these elements matter, the shift in emphasis reduces attention to higher-order thinking and content development.

For students labeled as low-achieving or having learning disabilities, AI tutors showed a pattern of withholding critical feedback. Instead, they provided positive reinforcement with limited actionable guidance.

Bias runs deeper than surface-level language

The study examined how AI tutors generate and refine responses across multiple dimensions: factual accuracy, depth of analysis, presentation quality, and use of evidence. Models were capable of producing high-quality feedback, but consistency varied depending on the student profile presented.

In some cases, AI outputs referenced cultural or gender stereotypes, particularly in responses linked to non-White or female students. This indicates that bias extends beyond language choices into how feedback is framed and what assumptions underpin it.

These patterns emerged even when prompts were held constant, with only student descriptors changing. The finding raises questions about how models interpret identity signals and adjust their instructional approach in response.

What this means for schools and EdTech providers

AI tutors are already integrated into learning platforms across K-12, higher education, and workforce training, often marketed as tools for scalable, personalized support. The Stanford findings suggest that personalization may introduce variability that is difficult to detect without structured evaluation.

If feedback differs based on perceived identity or ability, the consistency of learning support becomes harder to guarantee. For EdTech providers, this shifts focus from what systems can do to how they behave in practice.

Systems may need stronger guardrails, clearer evaluation frameworks, and ongoing monitoring to ensure outputs align with pedagogical goals. Educators adopting these tools should be aware that AI-generated feedback is increasingly used in formative assessment and writing support-areas where consistency and fairness are critical.

The research does not argue that AI tutors should be removed from classrooms. Instead, it highlights a gap between how these systems are positioned and how they currently perform.

Understanding Generative AI and LLM behavior is essential for educators implementing these tools. For a broader overview of AI for Education, resources are available to help institutions evaluate and monitor these systems responsibly.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)