Stopping AI from Teaching Bioweapon Creation by Filtering Training Data
What if the key to preventing AI from assisting in building biological weapons is as simple as never teaching it how? This question drove Stella Biderman, executive director of Eleuther AI, and her collaborators at the British government’s AI Security Institute to explore a new approach to AI safety—one that had not been publicly examined before.
Their recent paper, Deep Ignorance, presents a method of filtering out risky information from an AI's training data. By removing content related to potentially dangerous topics like bioweapons at the outset, they effectively "bake in" safety measures that are more resistant to tampering—even for open-source models anyone can access and modify.
Importantly, these safety filters did not significantly affect the model’s overall performance. The team trained AI models on datasets cleaned of proxy information—safe substitutes for hazardous content. The cleaned-data models struggled to produce harmful information but maintained strong performance on general tasks.
Stephen Casper, one of the lead authors, noted the goal was to make large language models (LLMs) “safe off the shelf” and resistant to harmful alterations later on. This stands in contrast to typical safety efforts that adjust models post-training. Such post-training tweaks, while useful short-term, are easier to reverse and can unintentionally weaken the AI.
Pre-training filtering offers a more durable solution by embedding safety directly into the model’s knowledge base, making it harder to bypass later.
Biderman highlighted that few public studies have taken on this pre-training filtering approach due to its high costs and complexity. Large AI companies like OpenAI and Anthropic have the resources but rarely disclose their methods for competitive and copyright reasons. However, OpenAI’s recent model cards suggest they use similar filtering techniques to remove hazardous biosecurity data before training.
Biderman sees Deep Ignorance as a way to increase transparency and encourage broader adoption of such safety measures. She criticized the tech industry’s frequent claim that their enormous datasets are too vast to fully document or review, calling it a convenient excuse that hinders progress in AI safety.
AI in the News
Cohere Secures $500 Million and Adds AI Leadership
Cohere announced a $500 million funding round led by Inovia Capital and Radical Ventures, valuing the company at $6.8 billion. This includes investments from AMD Ventures, NVIDIA, PSP Investments, and Salesforce Ventures. Alongside the funding, Cohere hired Joelle Pineau, formerly of Meta AI, as chief AI officer, and Francois Chadwick as chief financial officer.
Cohere’s CEO Aidan Gomez emphasized their security-first approach, noting rapid growth and increasing adoption of their AI solutions.
Study Finds AI Use Can Erode Doctors’ Diagnostic Skills
A new study published in The Lancet Gastroenterology and Hepatology warns of unintended consequences in medical AI use. Researchers observed that doctors relying on AI to detect pre-cancerous colon growths saw a 20% drop in detection rates once the AI assistance was removed.
The study suggests that over-dependence on AI tools may reduce clinicians’ motivation, focus, and sense of responsibility when working without AI support. This finding is particularly relevant as health systems worldwide increasingly integrate AI into diagnostics.
Anthropic Expands Enterprise AI Talent with Humanloop Team
Anthropic has acqui-hired the co-founders and most of the team from UK startup Humanloop, known for enterprise AI tools like prompt management and model evaluation. Key figures joining include CEO Raza Habib, CTO Peter Hayes, and CPO Jordan Burgess.
This move strengthens Anthropic’s enterprise capabilities by bringing in experienced engineers and researchers who have worked with clients such as Duolingo and Gusto. The deal, however, did not include Humanloop’s intellectual property.
AI Market Share Snapshot
- 78.5% – ChatGPT’s share of the generative AI market, dominating the space
- Gemini – 8.7%
- DeepSeek – 4.1%
- Grok – 2.5%
- Perplexity – 1.9%
- Claude – 1.6%
- Copilot – 1.2%
ChatGPT is also the fifth most-visited website globally and the fastest growing, with traffic increasing 134.9% year over year since its debut in November 2022.
Your membership also unlocks: