Chinese AIs DeepSeek and Qwen Outtrade ChatGPT and Gemini in Real-Money Crypto Challenge

Chinese models lead a live crypto trial so far: DeepSeek $21.6k, Qwen $17k, while Gemini 2.5 Pro and ChatGPT 5 are deep red. Results land Nov. 3-treat it as signal, not gospel.

Categorized in: AI News Finance
Published on: Nov 02, 2025
Chinese AIs DeepSeek and Qwen Outtrade ChatGPT and Gemini in Real-Money Crypto Challenge

Chinese AI Models Are Leading a Real-Money Crypto Trading Trial - What Finance Teams Should Take From It

In Nof1's Alpha Arena, a live crypto trading contest that started on Oct. 17, several frontier AI models were given $10,000, identical prompts, and the same data. The goal: maximize returns trading on the decentralized exchange Hyperliquid.

As of the latest update, DeepSeek V3.1 Chat is out in front with $21,600 - a 116% gain. Qwen 3 Max is second at roughly $17,000 (+70%). Claude 4.5 Sonnet and Grok 4 are battling for third and fourth with 11% and 4% gains, respectively. The laggards are Gemini 2.5 Pro and ChatGPT 5, both down more than 60%.

Nof1 noted that GPT-5 and Gemini 2.5 Pro often chose smaller position sizes - less aggressive than prior test runs - which likely weighed on performance. One industry voice suggested Chinese models may benefit from training on crypto-native, Asia-facing forums; DeepSeek is reportedly a side project of a quantitative trading shop. Others argue the results could be a random walk, where average performance drifts back to the start over time - a reminder to resist reading too much into short windows (Random Walk Theory).

The contest ends on Nov. 3, so there's still room for reshuffling. Either way, it's a useful stress test for how different models handle risk, sizing, and execution under the same constraints.

Why might Chinese models be ahead?

  • Domain exposure: more training on crypto-native discussions can shape heuristics and trade selection.
  • Quant DNA: if a model inherits practices from a trading firm, you might see tighter execution and clearer sizing rules.
  • Risk appetite: aggressiveness (or lack of it) shows up fast when markets move.

How finance teams can use this (without overfitting)

  • Treat contests as signal, not gospel. Short samples can look brilliant or disastrous by chance.
  • Judge the policy, not just P&L: position sizing, leverage, and execution logic matter more than one leaderboard.
  • Codify guardrails: max allocation per trade, daily loss limits, kill switches, liquidity thresholds, slippage and fee models.
  • Evaluate with full risk metrics: Sharpe/Sortino, max drawdown, turnover, win/loss ratio, and tail behavior.
  • Tune models with domain data (funding rates, basis, order-book signals, on-chain flows) and validate out of sample.
  • Separate domains: a model that trades crypto well may not pick equities well. One public test saw ChatGPT's small-cap picks slide to $76 from $100, while the S&P 500 would have reached $109.46 in the same span.
  • Keep a human in the loop. Use AI for idea generation and execution support; keep oversight on sizing and risk.

Context: AI trading research cuts both ways

Evidence is mixed. One Stanford study reported beating 93% of managers over 30 years by an average of 600% using a model trained only on public information. Meanwhile, a retail experiment with ChatGPT stock picks underperformed a simple index approach. Domain, timeframe, and risk policy make or break outcomes.

If you're exploring model-driven workflows for markets, map use cases to guardrails before capital goes live. Start small, measure hard, and scale only what survives out-of-sample testing.

For a practical overview of market-focused AI software, see this curated list: AI tools for finance.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)