Steering AI: New Method Enhances Control Over Large Language Models
Researchers at UC San Diego have developed a technique that offers more precise control over large language models (LLMs) such as Google Gemini and OpenAI ChatGPT. This innovation aims to improve the safety, reliability, and adaptability of AI systems by allowing developers to steer AI outputs more effectively.
Mikhail Belkin, a professor at UC San Diego’s Halıcıoğlu Data Science Institute (HDSI), along with a multi-institutional team, introduced a novel “nonlinear feature learning” method. This approach identifies and manipulates the key internal features within an LLM’s network, enabling targeted modification of the model’s behavior.
Addressing AI Challenges
LLMs excel in text generation, language translation, and answering questions, but their responses can sometimes be unpredictable or problematic. Issues such as biased content, misinformation, and toxic language remain concerns.
Belkin explained that their method acts like understanding the individual ingredients in a recipe rather than just the final dish. By analyzing internal activations across model layers, they pinpoint features linked to specific concepts—such as toxicity or factual accuracy—and adjust them to encourage or discourage certain outputs.
Practical Applications and Benefits
The technique has demonstrated effectiveness in reducing hallucinations (false information generation), harmfulness, and toxicity. It also enhances the AI’s grasp of diverse language styles, including Shakespearean English and poetic expressions.
One significant advantage of this approach is increased efficiency. By focusing on crucial internal features, fine-tuning requires less data and computational power, potentially lowering costs and making advanced AI technology more accessible.
This paves the way for creating AI applications specialized for particular tasks—like an assistant providing accurate medical information or creative AI tools that avoid stereotypes and clichés.
Collaboration and Resources
The research team includes experts from UC San Diego, MIT, Harvard, and the Broad Institute. Their work builds on recent publications in Science and PNAS, supported by organizations such as the National Science Foundation, the Simons Foundation, and the Office of Naval Research.
Computational resources were provided by the San Diego Supercomputer Center and the National Center for Computing Applications at the University of Illinois.
The team has made their code publicly available to encourage further development in AI safety and control. Interested researchers can access the resources on Belkin’s website.
Looking Ahead
Rajesh Gupta, interim dean for SCIDS and HDSI founding director at UC San Diego, highlighted the importance of controlling AI behavior as these technologies become more integrated into daily life. This research represents a meaningful step in building AI systems that are safer, more trustworthy, and more useful across various domains.
For those interested in expanding their skills related to AI and large language models, exploring specialized courses can be valuable. Relevant training can be found at Complete AI Training - Latest AI Courses.
Your membership also unlocks: