AI keeps Neanderthals stuck in outdated museum scenes
Ask popular models to depict Neanderthals "accurately," and they still default to clichΓ©s: hulking men, lonely hunts, and props from the wrong century. A new study shows that even with expert prompts and demands for correctness, outputs drift toward decades-old scholarship. That's a problem for classrooms, public outreach, and any quick search that looks credible at a glance.
The experiment in plain terms
Researchers used ChatGPT and DALL.E 3 to generate hundreds of Neanderthal scenes with four prompt styles: casual vs. expert, and with vs. without prompt revision. Each instruction set was run 100 times. Even under "be scientifically accurate" constraints, the systems mostly changed tone and surface detail when asked to sound expert. The facts lagged.
What went wrong, consistently
- Gender erasure: Outputs centered on heavily muscled males and sidelined women and children. That contradicts modern archaeology that examines family life, care, and childhood.
- Anachronisms everywhere: Ladders, thatched roofs, woven baskets, glass vessels, and metal tools appeared in Neanderthal scenes. These objects don't match site evidence and create a slick but false timeline.
- Flat storytelling: Text focused on hunt-sleep-survive loops. Linguistic analysis showed the writing style aligned with early-1960s scholarship, while images matched late-1980s to early-1990s aesthetics.
- Tone β accuracy: "Expert" outputs sounded confident but rarely updated claims. Style improved; substance didn't.
Why the regression happens
Models learn from what they can access. Paywalls and copyright limits keep much twentieth-century archaeology out of reach, while older work remains easier to scrape. Access gets mistaken for recency and trustworthiness.
As open access expands, newer research becomes more visible, but the gap remains uneven. Until training data reflects current, accessible scholarship, outputs will skew to what's easiest to ingest, not what's most accurate. For context on open access, see the Directory of Open Access Journals.
Why this matters for teaching and outreach
Students are pasting AI outputs into assignments. Museums and textbooks work to correct old stereotypes, yet a confident paragraph or polished image can reset those efforts in seconds. Without habits of verification, misinformation spreads faster than careful instruction.
Practical steps for researchers and educators
- Use retrieval over recall: Connect your assistant to a curated corpus (recent peer-reviewed work, site reports, data papers). Require inline citations with links and publication years.
- Enforce currency: Prompt for "use sources from the last 5-10 years unless citing a seminal study; flag older sources explicitly." Reject answers that can't provide citations.
- Bias and anachronism checks: For images, add negative prompts (no metal tools, no glassware, no ladders, no thatched roofs) and specify context (region, season, known toolkits). Require a short rationale for depicted artifacts.
- Run prompt variants: Test casual vs. expert phrasing and revised prompts. Compare outputs across seeds to detect drift toward stereotypes.
- Structured review: Create a checklist for gender representation, presence of children/elder care, tool and material accuracy, and environmental context. Don't publish until each is verified against sources.
- Track provenance: Log the prompt, model version, temperature, seeds, and cited works. This audit trail simplifies replication and correction.
- Teach skepticism: Require students to attach two primary or peer-reviewed sources for each AI-derived claim. Grade the sources, not the polish.
Method note worth copying
The study's simple four-prompt framework is a useful audit template: run multiple seeds, flip expert vs. casual tone, and revise vs. not. Measure how outputs shift and where they fail. Repeating this as models update will show whether the gap is closing or if new biases appear.
Citation and further reading
The research appears in Advances in Archaeological Practice. Pair model outputs with the current archaeological record before using them in teaching materials, media, or public exhibits.
If you're building or evaluating AI for research
Bottom line: fast tools are useful, but only with guardrails. Curate sources, demand citations, and audit outputs. That's how we keep yesterday's stories from rewriting the past.
Your membership also unlocks: