Jan Leike Heads Anthropic's Alignment Science Team
Jan Leike, one of the field's most recognized AI safety researchers, is building out Anthropic's Alignment Science team. The move represents a significant commitment to alignment research from one of the industry's most prominent labs.
Leike left OpenAI in May 2024 after publicly raising concerns about the company's safety priorities. He joined Anthropic that same month, essentially the same timeframe. At Anthropic, founded by former OpenAI researchers Dario and Daniela Amodei, he leads a team attacking some of the hardest unsolved problems in making AI systems behave as intended.
What the team is working on
The core challenge Leike's team addresses is straightforward to state but difficult to solve: how do you train an AI system to behave correctly on tasks where humans themselves struggle to evaluate the output?
The team is pursuing several research directions:
- Scalable oversight - developing techniques that allow humans to maintain meaningful control over AI systems as those systems become more capable than their overseers
- Weak-to-strong generalization - transferring alignment properties from less powerful models to more powerful ones
- Robustness to jailbreaks - preventing users from tricking AI systems into ignoring safety guidelines
- Automating alignment research - using AI agents that are sufficiently aligned to propose ideas and run experiments on alignment techniques
Background and influence
Leike previously worked at DeepMind before joining OpenAI in 2021, where he co-led the Superalignment project starting in June 2023. That project specifically targeted alignment for superintelligent AI systems.
His research continues to influence how other labs and academic groups approach alignment. Work from his team, particularly on weak-to-strong generalization and automated alignment research, is shaping research agendas across the industry. Leike maintains an active publication record through Anthropic's blog and his personal writing.
For professionals working in AI research, understanding these alignment approaches is increasingly central to the field. Those looking to deepen their knowledge in this area may benefit from exploring AI Research Courses or Generative AI and LLM Courses that cover the technical foundations underlying this work.
Your membership also unlocks: