NHN Cloud to power Krafton's AI-First strategy with GPUaaS at scale
Krafton has picked NHN Cloud to operate its new GPU Cluster Business, aligning directly with the company's AI-First strategy. The partnership brings a GPU as a Service (GPUaaS) model that lets Krafton scale compute as AI workloads grow and shift. The infrastructure will go live in July after buildout at NHN Cloud's Pangyo NCC.
What's being built
NHN Cloud is delivering a large-scale, multi-cluster GPU environment centered on 1,000 NVIDIA Blackwell Ultra GPUs. An XDR-800G InfiniBand network will connect the clusters to move large datasets between GPUs with low latency and high throughput.
This stack is built to run everything from rapid prototyping to large LLM training and high-volume inference. The design goal: maximize usable GPU hours per budget and sustain performance under mixed, concurrent workloads.
- Compute: 1,000 Blackwell Ultra GPUs (NVIDIA Blackwell)
- Network: XDR-800G InfiniBand for fast, predictable data movement (InfiniBand overview)
- Location: NHN Cloud Pangyo NCC with high-density power, cooling, and connectivity
Why this matters for executives
- Faster AI delivery: Standardized GPUaaS cuts provisioning time and removes bottlenecks between R&D, training, and inference.
- Flexible spending: Dynamic allocation reduces idle capacity and improves budget efficiency across teams and projects.
- Operational focus: NHN Cloud handles build and operations so Krafton can focus on platform, data, and product outcomes.
How the cluster will operate
The environment uses a multi-cluster architecture with dynamic resource management so multiple tasks can safely share GPUs. Resources are partitioned and adjusted in real time, minimizing idle time and smoothing peaks across development, training, and inference.
A Slurm-based scheduling approach is aligned with Kubernetes and HPC usage patterns, enabling stable operation even with mixed workloads at scale. The setup also supports integrations with AI tooling and external systems to keep teams productive without heavy lift from infrastructure teams.
Expected outcomes
- Higher utilization: Better GPU saturation under mixed loads means more throughput per dollar.
- Consistent performance: High-speed interconnects keep large model training and data-heavy jobs moving.
- Cleaner operating model: One platform for AI development, training, and inference simplifies governance and reporting.
Timeline and site
The build is underway at NHN Cloud's Pangyo NCC, a facility equipped for high-density GPU operations with strong power, cooling, and network capacity. Completion is slated for July, followed by full operational support focused on stable, predictable clusters.
What leaders should track next
- Clear quotas and cost controls by team and project, backed by chargeback or showback.
- SLAs for training throughput, job latency, and incident response.
- Standardized pipelines for data access, model versioning, and rollout to inference.
- Capacity planning tied to model roadmap and product milestones.
If you're planning a similar scale-up and want your teams skilled on AI platforms, MLOps, and LLM workflows, explore role-based AI courses to speed up onboarding and execution.
Your membership also unlocks: