Chinese AI Models Are Leading a Real-Money Crypto Trading Trial - What Finance Teams Should Take From It
In Nof1's Alpha Arena, a live crypto trading contest that started on Oct. 17, several frontier AI models were given $10,000, identical prompts, and the same data. The goal: maximize returns trading on the decentralized exchange Hyperliquid.
As of the latest update, DeepSeek V3.1 Chat is out in front with $21,600 - a 116% gain. Qwen 3 Max is second at roughly $17,000 (+70%). Claude 4.5 Sonnet and Grok 4 are battling for third and fourth with 11% and 4% gains, respectively. The laggards are Gemini 2.5 Pro and ChatGPT 5, both down more than 60%.
Nof1 noted that GPT-5 and Gemini 2.5 Pro often chose smaller position sizes - less aggressive than prior test runs - which likely weighed on performance. One industry voice suggested Chinese models may benefit from training on crypto-native, Asia-facing forums; DeepSeek is reportedly a side project of a quantitative trading shop. Others argue the results could be a random walk, where average performance drifts back to the start over time - a reminder to resist reading too much into short windows (Random Walk Theory).
The contest ends on Nov. 3, so there's still room for reshuffling. Either way, it's a useful stress test for how different models handle risk, sizing, and execution under the same constraints.
Why might Chinese models be ahead?
- Domain exposure: more training on crypto-native discussions can shape heuristics and trade selection.
- Quant DNA: if a model inherits practices from a trading firm, you might see tighter execution and clearer sizing rules.
- Risk appetite: aggressiveness (or lack of it) shows up fast when markets move.
How finance teams can use this (without overfitting)
- Treat contests as signal, not gospel. Short samples can look brilliant or disastrous by chance.
- Judge the policy, not just P&L: position sizing, leverage, and execution logic matter more than one leaderboard.
- Codify guardrails: max allocation per trade, daily loss limits, kill switches, liquidity thresholds, slippage and fee models.
- Evaluate with full risk metrics: Sharpe/Sortino, max drawdown, turnover, win/loss ratio, and tail behavior.
- Tune models with domain data (funding rates, basis, order-book signals, on-chain flows) and validate out of sample.
- Separate domains: a model that trades crypto well may not pick equities well. One public test saw ChatGPT's small-cap picks slide to $76 from $100, while the S&P 500 would have reached $109.46 in the same span.
- Keep a human in the loop. Use AI for idea generation and execution support; keep oversight on sizing and risk.
Context: AI trading research cuts both ways
Evidence is mixed. One Stanford study reported beating 93% of managers over 30 years by an average of 600% using a model trained only on public information. Meanwhile, a retail experiment with ChatGPT stock picks underperformed a simple index approach. Domain, timeframe, and risk policy make or break outcomes.
If you're exploring model-driven workflows for markets, map use cases to guardrails before capital goes live. Start small, measure hard, and scale only what survives out-of-sample testing.
For a practical overview of market-focused AI software, see this curated list: AI tools for finance.
Your membership also unlocks: