US government report calls Chinese AI a security risk as DeepSeek lags on performance, cost, and safety

US CAISI says Chinese models like DeepSeek are 'adversary AI' with security gaps, censorship, and weaker performance. Agencies should treat them as untrusted, test, and contain use.

Categorized in: AI News Government
Published on: Oct 03, 2025
US government report calls Chinese AI a security risk as DeepSeek lags on performance, cost, and safety

US report labels Chinese AI "adversary AI" - what government teams need to do now

A new US government assessment from the Centre for AI Standards and Innovation (CAISI) says Chinese AI models present risks to developers, consumers, and US national security due to security gaps and built-in censorship. The report names DeepSeek directly and finds Chinese models trail US systems on performance, cost, security, and adoption, despite fast growth.

For agencies, this is a procurement and risk management problem, not a headline. Treat foreign AI as untrusted by default. Validate claims with hard data, and contain exposure.

What CAISI found

CAISI's evaluation compared DeepSeek to top US models, including Anthropic's Claude Opus 4, OpenAI's GPT-5, and its open-weight model gpt-oss. DeepSeek scored lower on almost all of 19 public and internal benchmarks and was more susceptible to jailbreaks by hackers and cybercriminals.

Adoption is surging: DeepSeek downloads on Hugging Face are up nearly 1,000% this year; Alibaba Cloud's Qwen family is up 135%. Alibaba Cloud is closing in on Meta's Llama as the second-most popular model of all time, while US firms still lead total global downloads. More modified models were built on Qwen than on Google, Meta, Microsoft, and OpenAI combined.

Cost matters: the study says OpenAI's GPT-5-mini is, on average, 35% cheaper than DeepSeek's top model (V3.1) for similar performance via APIs. The report doesn't emphasize that DeepSeek's open-weight models can be installed locally, while proprietary US models generally require paid API access.

Artificial Analysis reports DeepSeek released newer models in recent weeks, cutting official API prices by over 50% while maintaining performance, but CAISI evaluated only R1, R1-0528, and V3.1. Expect pricing to keep moving.

Political and policy context

The findings follow President Donald Trump's AI Action Plan from July, which called for assessing the capabilities and alignment of frontier Chinese models. CAISI states Chinese government filtering is "built directly into DeepSeek models."

US Commerce Secretary Howard Lutnick wrote that the department is working to ensure "continued US leadership in AI," adding: "DeepSeek lags far behind, especially in cyber and software engineering… These weaknesses aren't just technical. They demonstrate why relying on foreign AI is dangerous and shortsighted."

DeepSeek has also been accused in the US of stealing user data and amplifying Chinese state narratives. Treat these as allegations and fold them into your risk evaluation and vendor due diligence.

Why this matters for federal, state, and local agencies

This is about supply chain, sensitive data, and mission continuity. Open-weight models can lower dependency on vendor APIs but shift security burden to your team. Foreign-hosted endpoints raise surveillance and data residency risks.

Standardize how you evaluate models, especially models tagged as adversary AI. Write clear thresholds for jailbreak rates, incident response, and data handling before any deployment touches production.

Action checklist for government buyers and CISOs

  • Demand independent red-team results, jailbreak rates, and safety system details mapped to the NIST AI Risk Management Framework.
  • For open-weight installs: require signed model artifacts, hashes, and an SBOM; isolate on segmented networks; disable outbound egress by default.
  • Ban sending CUI, PII, or mission data to foreign-hosted endpoints; enforce data residency and detailed access logs with retention controls.
  • Test for censorship bias and political filtering; document where model filters may suppress or skew results relevant to your mission.
  • Set a jailbreak tolerance threshold (e.g., below X% across Y tests) and require fix SLAs for discovered exploits.
  • Compare total cost of ownership: API fees vs. local compute, patching, monitoring, and compliance audits.
  • Establish quarterly re-evaluations; track price/performance updates (e.g., DeepSeek's recent cuts) without trading away security.
  • Restrict use of foreign AI for cyber operations and software engineering until it passes agency-defined security gates.

Metrics to request from vendors

  • Full benchmark suite and raw scores used for claims (including the 19 benchmarks in scope for CAISI-style testing).
  • Jailbreak rate, abuse mitigation methods, and patch cadence for safety vulnerabilities.
  • Data collection, retention, and model training reuse policies; clear "no-train-on-your-data" by contract where required.
  • Model provenance, versioning, and change logs; audit trails for content filters and alignment settings.
  • Independent audits or certifications and references from comparable public-sector deployments.

What to watch next

Further CAISI evaluations may include newly released DeepSeek models and updated prices. Expect more scrutiny on open-weight adoption, censorship behavior, and jailbreak resilience across vendors.

Hugging Face trends will continue to influence developers and integrators. If your teams prototype there, enforce strict policies on data, access, and model selection. Explore the Hugging Face model hub only on non-sensitive projects and sandboxed environments.

Upskilling your team

If your agency needs structured training on AI procurement, model evaluation, and safe deployment, see our curriculum by role at Complete AI Training.

Bottom line: treat foreign AI as high-risk by default, verify with independent testing, and contain it with strict controls. No model is worth a breach, a leak, or a mission failure.