Safeguarding Open-Source AI Models on Low-Power Devices Without Compromising Safety

Researchers at UC Riverside developed a method to keep AI safety features intact when models are downsized for low-power devices. This ensures open-source AI remains safe without external filters.

Categorized in: AI News Science and Research

Published on: Sep 05, 2025

Preserving AI Safeguards in Open-Source Models

As generative AI models transition from massive cloud servers to devices like phones and cars, they undergo trimming to reduce power consumption. This downsizing often removes critical safety features designed to prevent harmful outputs such as hate speech or instructions for illegal activities. Open-source AI models, while promoting transparency and innovation, face significant risks of misuse without these safeguards.

Researchers at the University of California, Riverside have developed a technique to maintain AI safety features even when open-source models are compacted for low-power devices. Unlike proprietary AI systems that rely on cloud infrastructure and continuous monitoring, open-source models can be downloaded, modified, and operated offline by anyone. This openness encourages innovation but also complicates oversight and safety enforcement.

Challenges of Reducing Model Size

The core problem the UCR team addressed is that safety features often degrade when models are reduced in size. To save memory and computation, lower-power deployments typically skip internal processing layers. While this improves speed and efficiency, it also weakens the model’s ability to filter unsafe content.

“Some of the skipped layers turn out to be essential for preventing unsafe outputs,” said Amit Roy-Chowdhury, professor of electrical and computer engineering and senior author of the study. “If you leave them out, the model may start answering questions it shouldn’t.”

A New Approach to Safety

The researchers’ solution involves retraining the internal structure of the AI model to preserve its detection and blocking of dangerous prompts, even when key layers are removed. This method does not rely on external filters or software patches but instead alters the model’s fundamental understanding of risky inputs.

“Our goal was to make sure the model doesn’t forget how to behave safely when it’s been slimmed down,” explained Saketh Bachu, UCR graduate student and co-lead author of the study.

Testing with Vision-Language Models

To validate their approach, the team experimented with LLaVA 1.5, a vision-language model that processes both text and images. They discovered that certain input combinations, such as a harmless image paired with a malicious question, could bypass the model’s safety filters. In one test, the downsized model responded with detailed instructions for building a bomb.

After retraining, however, the model consistently refused to answer dangerous queries, even when operating with only a fraction of its original architecture.

“This isn’t about adding filters or external guardrails,” Bachu emphasized. “We’re changing the model’s internal understanding, so it’s on good behavior by default, even when it’s been modified.”

“Benevolent Hacking” to Fortify AI

Bachu and co-lead author Erfan Shayegani describe their work as “benevolent hacking”—proactively strengthening models before vulnerabilities are exploited. Their ultimate aim is to develop techniques that ensure consistent safety across every internal layer, helping AI perform reliably in real-world conditions.

Amit Roy-Chowdhury, Professor of Electrical and Computer Engineering
Saketh Bachu, Graduate Student and Co-Lead Author
Erfan Shayegani, Graduate Student and Co-Lead Author
Arindam Dutta, Doctoral Student
Rohit Lal, Doctoral Student
Trishna Chakraborty, Doctoral Student
Chengyu Song, Faculty Member
Yue Dong, Faculty Member
Nael Abu-Ghazaleh, Faculty Member

Their research was presented at the International Conference on Machine Learning held in Vancouver, Canada.

“There’s still more work to do,” Roy-Chowdhury noted. “But this is a concrete step toward developing AI in a way that’s both open and responsible.”

Get Daily AI News

Your membership also unlocks:

700+ AI Courses

700+ Certifications

Personalized AI Learning Plan

6500+ AI Tools (no Ads)

Daily AI News by job industry (no Ads)

Safeguarding Open-Source AI Models on Low-Power Devices Without Compromising Safety

Preserving AI Safeguards in Open-Source Models

Challenges of Reducing Model Size

A New Approach to Safety

Testing with Vision-Language Models

“Benevolent Hacking” to Fortify AI

Related AI News for Science and Research

Cisco and KAUST Unveil AI Institute in Saudi Arabia, Backed by Prince Abdulaziz bin Salman, to Accelerate Research, Industry 5.0 and Talent for Vision 2030

From biodegradable materials to AI cancer care: Indian scientists making a global impact

Lead, Block, Advance: Scientists' Action Plan for AI That Serves the Public

Speed Up AI, Simulation, and Virtual Labs on Campus with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs

About Complete AI:

Latest AI News for your Job:

Courses by AI Skill:

Courses by Job Field:

Courses by AI Company:

AI Tools for your Job:

AI Tools by Type:

AI Certifications by Skill:

AI Certifications by Job Field:

AI Certifications by Company: