LLMs Mirror A Culture-Wide Bias Against Older Women - Here's What Builders And Teams Can Do
New research in Nature points to a hard truth: stereotypes about gender and age are baked into our online content and echoed by large language models. Across images, videos, and billions of words, older women are depicted as younger, less experienced, and less qualified than they are. That distortion bleeds into AI outputs used for work, hiring, and everyday decisions.
The takeaway for IT, developers, and operators is straightforward: if you use generative AI in any people-facing workflow, you need bias testing, guardrails, and human oversight. Without them, your tools will quietly tilt decisions.
What The Research Found
Analyzing 1.4 million images and videos from Google, Wikipedia, IMDb, Flickr, and YouTube, researchers found that women are portrayed as younger than men, especially in higher-status, higher-paid occupations. The same pattern shows up in image datasets used to train machine learning systems.
Nine leading LLMs were also examined. Even without visuals, models still produced biased and distorted depictions of older women, suggesting that age-gender bias sits inside the concepts and associations these systems learn from human text at scale.
In related work, brief exposure to occupation images shifted people's gender associations with jobs. If 45 minutes can tilt perception, a full day online compounds it.
Stereotypes Reflected In Resumes
To test a practical use case, the team prompted a popular LLM to generate 34,500 resumes for 54 occupations using typical male and female names. Resumes for women skewed younger and less experienced. Then the model rated its own outputs and gave older men the highest scores-even with the same initial inputs as women's resumes.
This pattern puts older women and younger applicants at a disadvantage, while advantaging older men. As one researcher noted, "When [an LLM] creates a resume, it draws on ideas about what makes a good candidate and which skills are relevant. There are countless opportunities for stereotypes to slip in."
Why Filters Aren't Enough
AI providers often add filters to block overtly biased content. That helps at the surface. But gendered ageism is often subtle: which achievements get highlighted, which skills are emphasized, what age a role "sounds like." Filters miss that. As the researchers argue, fixing this requires deeper changes to data and model training, not cosmetic patches.
Action Plan For Engineers, Data Teams, And Hiring Operators
- Set policy: age and gender are sensitive attributes. Do not request or infer age in resume or candidate flows unless required by law. Remove graduation years and age-coded phrases before scoring.
- Use structured prompts and templates. Specify neutral, skill-first fields: scope, impact, metrics, tools, domain. Avoid descriptors like "young, energetic," or "seasoned" unless they map to explicit skills or years-of-experience ranges.
- Counterfactual testing at scale. Generate pairs of resumes with only names and age markers changed. Ratings should remain stable. Alert on gaps in experience length, leadership verbs, or score deltas.
- Name-and-age swapping. Test "Ava, 58" vs. "Ethan, 58," and "Ava, 28" vs. "Ethan, 28." Compare experience years, seniority levels, and confidence language. Log differences.
- Post-processing rules. Strip age cues. Penalize age-coded adjectives. Score strictly on evidence (projects shipped, revenue impact, incident response time, patents, publications, code merged).
- Retrieval augmentation with curated corpora. Feed examples that feature experienced women succeeding in high-status roles. Balance the reference set the model leans on.
- Human-in-the-loop review. Blind reviews before final decisions. Rotate reviewers. Require justification tied to job criteria, not biographies.
- Fairness metrics and monitoring. Track rating parity by gender-coded names and age bands. Use counterfactual fairness checks and threshold tests. Escalate on drift.
- Vendor diligence. Ask for test reports, dataset summaries, and override controls. Require the ability to disable age inferences and enforce structured scoring.
- Compliance frameworks. Map controls to recognized guidance such as the NIST AI Risk Management Framework. Avoid fully automated employment decisions.
For Teams Using LLMs To Write Resumes Or Job Posts
- Prompt for impact and metrics, not persona. Example: "Summarize outcomes, scope, and tech for a Staff SWE resume. Exclude age and personal traits."
- Run a counter-bias pass: "Rewrite to be age- and gender-neutral. Verify experience years reflect input."
- For job posts, ban age-coded language ("digital native," "young team"). List must-have skills and concrete performance expectations.
Why This Matters
Biased outputs aren't just unfair; they reduce signal. Teams miss qualified candidates, skew promotions, and overfit to stereotypes that have nothing to do with capability. As one researcher put it, to make progress "the bias has to be addressed at a fundamental level." Until then, assume distortion and instrument your stack.
Sources And Further Reading
Study published in Nature. For practical governance, see the NIST AI RMF linked above.
Deepen Your Practice
If you're building or auditing LLM workflows, these resources can help your team design prompts and reviews that reduce bias:
Your membership also unlocks: