PromptPwnd: Malicious PRs and Issues Trick AI Into Running Privileged Commands

Malicious issues or PRs can coax AI in your CI/CD to run privileged commands-researchers call it 'PromptPwnd.' Keep untrusted text out, gate actions, and tighten tokens.

Published on: Dec 05, 2025
PromptPwnd: Malicious PRs and Issues Trick AI Into Running Privileged Commands

PromptPwnd: How malicious issues and PRs can hijack AI in your CI/CD

Researchers uncovered a simple but costly trap: malicious text in GitHub issues or pull requests can trick AI agents inside CI/CD workflows into running privileged commands. The attack pattern, dubbed "PromptPwnd," targets pipelines wired to AI tools and fed with untrusted user content.

If your workflow lets the AI take actions with write scopes or cloud access, you're exposed. That can mean unintended commits, leaking secrets, or other high-impact changes.

What's actually happening

Many teams pair GitHub Actions or GitLab CI/CD with AI helpers like Gemini CLI, Claude Code Actions, OpenAI Codex Actions, or GitHub AI Inference. The problem starts when issue bodies, PR descriptions, or commit messages flow straight into prompts. The model responds, and the workflow uses that response to run commands or make edits.

Attackers can hide instructions inside a normal-looking issue or PR. The model "helpfully" follows them. The pipeline executes the output with powerful tokens. Bad day.

Why PromptPwnd works

  • AI steps run with high-privilege tokens (GITHUB_TOKEN, cloud keys).
  • Prompts embed user-controlled fields (issue/PR text, commits, comments).
  • Model output is executed or trusted without review.

In tests, researchers showed Gemini CLI could be nudged via a crafted issue to run attacker-supplied commands and reveal credentials. The same architecture pattern shows up across multiple AI-powered Actions. Vendors have been notified.

Who's at risk

Any repo where untrusted users can open issues or PRs and where AI-driven steps have meaningful permissions. Some workflows require collaborator-level access to abuse; others can be triggered by anyone filing an issue.

Mitigation plan

  • Avoid piping raw user content into prompts. Don't insert issue bodies, PR descriptions, or commit messages directly. If you must, strip risky markers and summarize in a separate, unprivileged job.
  • Treat model output as untrusted. Require human approval before running commands, committing changes, or exposing data derived from AI responses.
  • Constrain AI capabilities. Disable shell/tool execution by default. Use allowlists for commands, files, and repos. Run in a sandboxed container with minimal network access.
  • Harden tokens. Use least-privilege, short-lived credentials. Set permissions: read by default; elevate per job only when needed. Keep cloud keys out of AI steps.
  • Event hygiene. Don't trigger privileged jobs directly on issues, issue_comment, or pull_request events without checks. Use pull_request_target with caution and strict permissions.
  • Isolate untrusted context. Split "AI triage" from "privileged actions." Pass only vetted summaries or structured data between jobs via artifacts.
  • Add guardrails. Log prompts and responses. Fail the job if model output contains shell metacharacters, dangerous flags, or sensitive paths.
  • Review prompt templates. Remove user-controlled fields from prompts, or gate them behind strict filters and explicit labels.
  • Scan workflows. Use rules (e.g., Opengrep) to detect AI steps that ingest untrusted input or run with elevated scopes.

Quick checks you can run today

  • Search workflow files for references like github.event.issue.body, github.event.pull_request.body, or commit messages feeding AI prompts.
  • Verify permissions with a default permissions: contents: read. Elevate only for specific steps that need it.
  • Add required approvals or protected environments before any AI-driven write, secret read, or deploy step.

Helpful references

Want to level up prompt safety across your team?

If you're building with AI in production and need shared best practices, explore our prompt engineering resources: Prompt Engineering.

Bottom line

PromptPwnd isn't a model bug-it's a workflow design flaw. Keep untrusted user text out of prompts, treat AI output like input from the internet, and lock down permissions. Small changes in your YAML can prevent big incidents.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide