US Models Maintain Lead Over Chinese AI, According to Government Benchmark
A US government benchmark shows Deepseek V4 Pro lags US models by roughly eight months in overall capability. The Center for AI Standards and Innovation (CAISI) tested the Chinese open-weight model across cybersecurity, software development, math, natural sciences, and abstract reasoning.
CAISI calls Deepseek V4 the most capable Chinese model to date. Yet in private testing, it performed worse than Deepseek's own technical report suggested. The company positioned the model as roughly equivalent to US offerings like Opus 4.6 and GPT-5.4. CAISI found it closer to the older GPT-5 instead.
The gap appears widest in abstract reasoning, cybersecurity, and software development. Math is the one area where Deepseek V4 nearly matches top US models.
CAISI reports the gap between US and Chinese models continues to widen. Independent measurements, however, show the gap has remained roughly constant.
Price becomes a competitive factor
Deepseek V4 holds a clear cost advantage. It came in cheaper than GPT-5.4 mini in five of seven tests.
Price matters more as models handle longer tasks and more complex work. Top US models continue getting pricier while questions linger about actual productivity gains. Businesses lack reliable ways to measure return on investment once you factor in training, upskilling, and error checking.
Below a certain capability threshold, lower-cost models may appeal more than premium options. Cursor, a coding tool reportedly being acquired by SpaceX, built its custom model on a Chinese open-weight foundation to undercut OpenAI and Anthropic pricing.
The smarter-versus-cheaper debate
OpenAI CEO Sam Altman recently said he wants models to be cheaper and faster, but acknowledged that being smarter remains the priority. His reasoning: smarter models could accelerate their own development.
OpenAI, Anthropic, and Chinese developers have all said their models are already speeding up internal R&D work.
Government agencies evaluating AI tools should track both performance gaps and cost trajectories as the market evolves.
Your membership also unlocks: