AMD opens preorders for Ryzen AI Halo developer platform with 128 GB of unified memory

AMD opened preorders for a $3,999 Ryzen AI Halo mini PC with 128 GB of unified memory to run local AI models. The system helps developers reduce cloud inference costs.

Categorized in: AI News Product Development
Published on: Jun 14, 2026
AMD opens preorders for Ryzen AI Halo developer platform with 128 GB of unified memory

AMD has opened preorders for its Ryzen AI Halo developer platform, a compact mini PC priced at $3,999. Equipped with 128 GB of unified memory, the system broadens the ability to run large language models locally and shifts inference cost and latency trade-offs away from cloud-only workflows.

Hardware specifications

The first retail systems use the Ryzen AI MAX+ 395 system-on-chip, codenamed Strix Halo. This package pairs a 16-core, 32-thread Zen 5 CPU with a 40-core RDNA 3.5 integrated GPU and a 50 TOPS XDNA 2 neural processing unit. The entire configuration fits into a 5.9-inch by 5.9-inch by 1.7-inch chassis and includes 2 TB of NVMe storage. AMD claims the platform can run local models up to 200 billion parameters.

Performance and software environment

AMD positions the device for development workflows using ROCm and a curated software stack. Vendor materials include token-per-second benchmark comparisons against NVIDIA's DGX Spark, showing single-model performance gains of up to 14 percent in specific workloads. The company offers both Windows and Linux configurations, with initial preorders available through Micro Center starting in June.

Professionals working in AI for Product Development will note that this unified-memory architecture reduces data-movement overhead for medium-to-large models. This setup often improves token throughput per dollar in on-premise scenarios compared to fragmented memory systems.

Market context and limitations

This release competes directly on price and form factor with specialist systems like NVIDIA's DGX Spark. AMD's pitch emphasizes cost-effective local compute, with company materials showing a break-even point against cloud inference costs at moderate token volumes. However, industry observers will watch three specific areas closely.

  • Software maturity: Whether ROCm drivers and system management tools run reliably across multiple Linux distributions, as mainline kernel integration remains pending.
  • Real-world throughput: Independent benchmarks outside vendor materials to verify token claims for common models like SDXL and Qwen.
  • Ecosystem adoption: Availability through retail partners and whether third-party vendors offer additional configurations for enterprise use.

As teams explore AI for IT & Development, hardware with 128 GB of unified memory broadens the set of models that can run without offloading to remote GPUs. This capability directly supports teams that run many short experiments or require low-latency local inference.

Why this matters for product development

For product developers, the immediate implication is a sub-$5,000, compact appliance that increases the feasible set of on-premises model experiments. A dedicated neural processing unit combined with high unified memory allows teams to run short, low-latency local inference tests without waiting for cloud GPU allocation. Practitioners should validate these vendor benchmarks independently and track software support for production workflows before committing to heavy fine-tuning tasks.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)