New Method Enables Precise Control and Safer Outputs in Large Language Models

UC San Diego researchers developed a method to better control large language models like ChatGPT, reducing harmful outputs and improving accuracy. This approach boosts efficiency by targeting key internal features.

Categorized in: AI News Science and Research
Published on: May 14, 2025
New Method Enables Precise Control and Safer Outputs in Large Language Models

Steering AI: New Method Enhances Control Over Large Language Models

Researchers at UC San Diego have developed a technique that offers more precise control over large language models (LLMs) such as Google Gemini and OpenAI ChatGPT. This innovation aims to improve the safety, reliability, and adaptability of AI systems by allowing developers to steer AI outputs more effectively.

Mikhail Belkin, a professor at UC San Diego’s Halıcıoğlu Data Science Institute (HDSI), along with a multi-institutional team, introduced a novel “nonlinear feature learning” method. This approach identifies and manipulates the key internal features within an LLM’s network, enabling targeted modification of the model’s behavior.

Addressing AI Challenges

LLMs excel in text generation, language translation, and answering questions, but their responses can sometimes be unpredictable or problematic. Issues such as biased content, misinformation, and toxic language remain concerns.

Belkin explained that their method acts like understanding the individual ingredients in a recipe rather than just the final dish. By analyzing internal activations across model layers, they pinpoint features linked to specific concepts—such as toxicity or factual accuracy—and adjust them to encourage or discourage certain outputs.

Practical Applications and Benefits

The technique has demonstrated effectiveness in reducing hallucinations (false information generation), harmfulness, and toxicity. It also enhances the AI’s grasp of diverse language styles, including Shakespearean English and poetic expressions.

One significant advantage of this approach is increased efficiency. By focusing on crucial internal features, fine-tuning requires less data and computational power, potentially lowering costs and making advanced AI technology more accessible.

This paves the way for creating AI applications specialized for particular tasks—like an assistant providing accurate medical information or creative AI tools that avoid stereotypes and clichés.

Collaboration and Resources

The research team includes experts from UC San Diego, MIT, Harvard, and the Broad Institute. Their work builds on recent publications in Science and PNAS, supported by organizations such as the National Science Foundation, the Simons Foundation, and the Office of Naval Research.

Computational resources were provided by the San Diego Supercomputer Center and the National Center for Computing Applications at the University of Illinois.

The team has made their code publicly available to encourage further development in AI safety and control. Interested researchers can access the resources on Belkin’s website.

Looking Ahead

Rajesh Gupta, interim dean for SCIDS and HDSI founding director at UC San Diego, highlighted the importance of controlling AI behavior as these technologies become more integrated into daily life. This research represents a meaningful step in building AI systems that are safer, more trustworthy, and more useful across various domains.

For those interested in expanding their skills related to AI and large language models, exploring specialized courses can be valuable. Relevant training can be found at Complete AI Training - Latest AI Courses.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)