How Scrubbing Training Data Can Make AI Models Safer Against Bioweapon Risks

Researchers propose filtering out dangerous content like bioweapons from AI training data to embed safety measures early on. This approach keeps models effective while reducing risks of misuse.

Stopping AI from Teaching Bioweapon Creation by Filtering Training Data

What if the key to preventing AI from assisting in building biological weapons is as simple as never teaching it how? This question drove Stella Biderman, executive director of Eleuther AI, and her collaborators at the British government’s AI Security Institute to explore a new approach to AI safety—one that had not been publicly examined before.

Their recent paper, Deep Ignorance, presents a method of filtering out risky information from an AI's training data. By removing content related to potentially dangerous topics like bioweapons at the outset, they effectively "bake in" safety measures that are more resistant to tampering—even for open-source models anyone can access and modify.

Importantly, these safety filters did not significantly affect the model’s overall performance. The team trained AI models on datasets cleaned of proxy information—safe substitutes for hazardous content. The cleaned-data models struggled to produce harmful information but maintained strong performance on general tasks.

Stephen Casper, one of the lead authors, noted the goal was to make large language models (LLMs) “safe off the shelf” and resistant to harmful alterations later on. This stands in contrast to typical safety efforts that adjust models post-training. Such post-training tweaks, while useful short-term, are easier to reverse and can unintentionally weaken the AI.

Pre-training filtering offers a more durable solution by embedding safety directly into the model’s knowledge base, making it harder to bypass later.

Biderman highlighted that few public studies have taken on this pre-training filtering approach due to its high costs and complexity. Large AI companies like OpenAI and Anthropic have the resources but rarely disclose their methods for competitive and copyright reasons. However, OpenAI’s recent model cards suggest they use similar filtering techniques to remove hazardous biosecurity data before training.

Biderman sees Deep Ignorance as a way to increase transparency and encourage broader adoption of such safety measures. She criticized the tech industry’s frequent claim that their enormous datasets are too vast to fully document or review, calling it a convenient excuse that hinders progress in AI safety.

AI in the News

Cohere Secures $500 Million and Adds AI Leadership

Cohere announced a $500 million funding round led by Inovia Capital and Radical Ventures, valuing the company at $6.8 billion. This includes investments from AMD Ventures, NVIDIA, PSP Investments, and Salesforce Ventures. Alongside the funding, Cohere hired Joelle Pineau, formerly of Meta AI, as chief AI officer, and Francois Chadwick as chief financial officer.

Cohere’s CEO Aidan Gomez emphasized their security-first approach, noting rapid growth and increasing adoption of their AI solutions.

Study Finds AI Use Can Erode Doctors’ Diagnostic Skills

A new study published in The Lancet Gastroenterology and Hepatology warns of unintended consequences in medical AI use. Researchers observed that doctors relying on AI to detect pre-cancerous colon growths saw a 20% drop in detection rates once the AI assistance was removed.

The study suggests that over-dependence on AI tools may reduce clinicians’ motivation, focus, and sense of responsibility when working without AI support. This finding is particularly relevant as health systems worldwide increasingly integrate AI into diagnostics.

Anthropic Expands Enterprise AI Talent with Humanloop Team

Anthropic has acqui-hired the co-founders and most of the team from UK startup Humanloop, known for enterprise AI tools like prompt management and model evaluation. Key figures joining include CEO Raza Habib, CTO Peter Hayes, and CPO Jordan Burgess.

This move strengthens Anthropic’s enterprise capabilities by bringing in experienced engineers and researchers who have worked with clients such as Duolingo and Gusto. The deal, however, did not include Humanloop’s intellectual property.

AI Market Share Snapshot

78.5% – ChatGPT’s share of the generative AI market, dominating the space
Gemini – 8.7%
DeepSeek – 4.1%
Grok – 2.5%
Perplexity – 1.9%
Claude – 1.6%
Copilot – 1.2%

ChatGPT is also the fifth most-visited website globally and the fastest growing, with traffic increasing 134.9% year over year since its debut in November 2022.

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Advertisement

How Scrubbing Training Data Can Make AI Models Safer Against Bioweapon Risks

Stopping AI from Teaching Bioweapon Creation by Filtering Training Data

AI in the News

Cohere Secures $500 Million and Adds AI Leadership

Study Finds AI Use Can Erode Doctors’ Diagnostic Skills

Anthropic Expands Enterprise AI Talent with Humanloop Team

AI Market Share Snapshot

Related AI News for Science and Research

Khatchig Mouradian Joins $11M Schmidt Sciences Initiative Bringing AI to the Humanities

AI Outpaces Readiness in Labs: Put Strategy First, Pair HR With IT, and Pace the Change

GPT-5.2 sets a new bar for math and science, from benchmark highs to a solved open problem

UH-led AI maps the Sun's magnetic field in 3D for earlier solar storm warnings

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: