Ageism meets sexism: AI favors older men and penalizes older women

ChatGPT made women 1.6 years younger on CVs, then rated them less qualified, favoring younger women and older men. De-identify inputs, fix a rubric, test bias, require review.

Published on: Oct 09, 2025
Ageism meets sexism: AI favors older men and penalizes older women

AI bias makes women look younger and less qualified - and it shows up in hiring tasks

Researchers report that when asked to generate resumes for female and male names, ChatGPT made female candidates an average of 1.6 years younger than male candidates. The model then ranked those younger female candidates as less qualified, creating a loop where age and gender bias compound. The result favors younger women and older men, despite U.S. Census data showing male and female employees are roughly the same age. The skew held even in fields where women trend older, such as sales and service.

The likely culprit: patterns learned from the internet. Online images and text often depict women as younger than men in professional contexts, and those patterns bleed into model behavior. When age is inferred or implied, it gets treated as a proxy for experience - and women get downgraded.

Why IT, product, and engineering teams should care

  • If your org uses chatbots to screen, summarize, or rank candidates, hidden age-gender bias can slip into shortlists and notes.
  • Internal assistants that auto-generate bios, performance summaries, or skills profiles can quietly skew outcomes and reviews.
  • Compliance and trust take a hit. Even support tooling that "just drafts" content can influence final decisions.

What you can do this week

  • Remove demographic cues at the source: Standardize inputs. Strip names, graduation years, photos, and pronouns before any AI step.
  • Anchor on explicit criteria: Score only by years of relevant experience, skills, certifications, and outcomes. Ban "inferred" attributes like age.
  • Lock in a rubric: Use a fixed scoring grid (e.g., skills coverage, years with core stack, shipped projects, measurable impact) and require evidence for each score.
  • Constrain the model: In your system prompt, state: "Do not infer age or gender from names, dates, or context. If uncertain, say 'not provided.' Evaluate only against the rubric."
  • Run counterfactual tests: Swap gender-coded names across identical resumes and compare age assumptions, experience summaries, and final rankings.
  • Measure parity: Track average inferred age, qualification scores, and ranking by name sets. Set alert thresholds for gaps.
  • Separate summarization from ranking: First extract structured facts, then score against the rubric. No free-form "fit" judgments.
  • Add a "no-age" guardrail: If the model mentions age or implies it, force a retry or flag for review.
  • Keep a human in the loop: AI output is a draft, not a decision. Require documented human review for hiring and performance actions.

A quick audit template you can replicate

  • Data prep: Build 50-100 paired resumes with identical content but different names (female/male), covering multiple roles.
  • Prompts: Ask the model to summarize qualifications and provide a rubric-based score. Explicitly forbid age inference.
  • Metrics: Compare average age mentions (should be "not provided"), average scores, rank order stability, and justification overlap.
  • Sectors: Include roles in tech, sales, and service to spot sector-specific drift.
  • Regression: Re-run after any prompt, model, or policy change. Track trends over time.
  • Escalation: If gaps exceed your threshold, block deployment and require fixes (prompt changes, post-processing, or model swap).

Policy guardrails that actually work

  • No demographic inference: Ban age, gender, and similar inferences from names, dates, or photos.
  • Evidence-first scoring: Every score must map to a cited line in the source document.
  • Logging: Keep prompts, outputs, and scores for audits. Sample regularly for bias testing.
  • User training: Teach teams to supply structured inputs and to reject outputs that stray from the rubric.

Useful references

Level up your team's prompting and evaluation

If your org is building AI-assisted hiring or internal talent tools, bias-aware prompting and structured evaluation are non-negotiable. Train your team to write constraints, design rubrics, and run counterfactual tests before anything goes live.

Bias-aware prompt engineering resources

Key takeaway

General-purpose chatbots can make women "younger" on paper and then read that as "less experienced." Don't let that leak into your systems. Strip demographic cues, force rubric-based scoring, test with counterfactuals, and keep humans accountable for the final call.


Tired of ads interrupting your AI News updates? Become a Member
Enjoy Ad-Free Experience
Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)