SK hynix bets big on NAND for AI with Nvidia SSDs and an HBF standard with SanDisk

SK hynix and NVIDIA are co-developing AI SSDs targeting ~10x speed, up to 100M IOPS by 2027. HBF flash with SanDisk targets AI-scale bandwidth; an alpha is slated next year.

Categorized in: AI News Product Development
Published on: Dec 16, 2025
SK hynix bets big on NAND for AI with Nvidia SSDs and an HBF standard with SanDisk

SK hynix + NVIDIA: SSDs and HBF for AI Inference - What Product Teams Should Prepare For

SK hynix is moving past HBM wins and pushing into NAND-based storage for AI. The company is co-developing a next-gen SSD with NVIDIA under "Storage Next" (NVIDIA) and "AI-N P" (SK hynix), targeting roughly 10x the performance of current SSDs. A prototype is planned for late next year, with a projection of up to 100 million IOPS by 2027.

In parallel, SK hynix and SanDisk are building a high-bandwidth flash (HBF) standard-stacked NAND structured for AI-scale bandwidth. An alpha is expected late January next year, with customer prototypes in 2027. Shinyoung Securities estimates HBF could open around $1B in 2027 and reach $12B by 2030.

Why storage is now the bottleneck

GPUs solved training throughput by pairing with HBM, which widened the data pipe and kept cores busy. But inference is a different game: latency and capacity dominate.

For reference, GPT-4 inference is said to need about 3.6 TB, while a single HBM3E GPU offers roughly 192 GB. That often forces 6-7 GPUs per request, pushing service cost and complexity. Personalization lifts memory needs even further. Since HBM is volatile, it doesn't hold long-term user context. NAND does.

What SK hynix is building

  • AI-N P (with NVIDIA): New SSD architecture and controllers optimized for large-scale AI inference I/O. The goal: reduce storage-compute stalls and improve perf-per-watt. PoC underway; prototype targeted for late next year; projected up to 100M IOPS in 2027.
  • AI-N B / HBF (with SanDisk): Stacked NAND akin to HBM's wide data path, but for non-volatile flash. Alpha expected late January; customer prototypes in 2027. Packaging leans on SK hynix's VFO (vertical fan-out) to connect along the die edge-avoiding TSV drilling and the yield hits that are tough on complex 3D NAND.
  • AI-N D: A middle-tier storage layer that targets TB-PB scale with SSD-like speed and HDD-like economics for inference-era workloads.

Key implications for product development

  • Architect for memory tiers: HBM for hot tensors; SSD/HBF for large weights, caches, and user context. Design clear data residency rules and move less data across tiers.
  • Target latency, not just throughput: Prototype with realistic token budgets and batch sizes. Track tail latencies at the pipeline level (GPU + network + storage).
  • Modernize the I/O path: Adopt async I/O, batched reads, and direct paths like NVIDIA GPUDirect Storage to cut CPU mediation.
  • Plan for new standards: HBF may ship through GPU vendors or storage OEMs. Expect co-validation cycles and firmware dependencies. Avoid lock-in with abstraction layers where possible.
  • Budget for capacity at inference: Personalization requires persistent context. Model your per-user and per-session memory footprints up front.
  • Balance endurance and cost: Profile read/write ratios. Minimize write amplification with smarter sharding, prefetching, and compaction policies.
  • Thermals and density: Higher IOPS means heat. Validate airflow, power envelopes, and slot count early-especially with dense NVMe or future HBF modules.
  • Networking matters: Wide storage bandwidth is pointless if east-west traffic or PCIe lanes throttle end-to-end performance. Align PCIe Gen, NICs, and switch fabrics with your I/O targets.
  • Observability: Instrument queue depth, IOPS per query, bytes per token, and GPU stall reasons. Tie these to unit economics.
  • Roadmap alignment: Prototypes hit late next year; broader availability points to 2027. Line up vendor access, PoCs, and budget cycles now.

How this changes your build strategy

Training-optimized stacks won't carry you through inference at scale. You'll need non-volatile tiers that keep user context close and feeds GPUs fast enough to choke less on I/O.

The likely end state: HBM + HBF + SSD, each with a clear role, plus software that actually takes advantage of the layout.

Immediate actions for your team

  • Prototype NVMe-heavy inference nodes with direct storage paths; measure query cost vs. latency.
  • Define a memory map per model (hot, warm, cold) and enforce it in code, not slides.
  • Start vendor dialogues on AI-N P and HBF PoCs; document interface and firmware assumptions.
  • Size your 2026-2027 infra budgets around capacity + latency, not just FLOPs.
  • Upskill the team on AI infra and storage-aware inference patterns. See curated options by job role here.

Useful references

Bottom line: inference is pushing past HBM's comfort zone. NAND-via faster SSDs and HBF-looks set to become a first-class citizen in AI system design. If you own product outcomes, build for it now.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide