Plano's Next Big GPU Hub: What IT and Dev Teams Should Plan For Now
AI infrastructure in the Metroplex is scaling up fast. In southeast Plano, a two-story, 425,000-square-foot facility at 601 N. Star Road is moving through late-stage construction to support thousands of server racks for high-density compute.
Aligned Data Centers LLC is building the site for Lambda Inc., a cloud-computing company backed by Nvidia. The $700 million project will phase in capacity through next year, ultimately targeting 72 megawatts of compute power.
Key facts
- Location: 601 N. Star Road, Plano, TX
- Developer: Aligned Data Centers LLC
- Customer: Lambda Inc. (backed by Nvidia)
- Size: 425,000 sq. ft., two stories; thousands of server racks
- Budget: $700 million
- Total capacity: 72 MW by project completion next year
- Near-term delivery: First 9 MW ready for server installs in June; additional capacity delivered in 9 MW phases
Why 72 MW matters for your workloads
AI training clusters are pushing rack densities into the tens of kilowatts per rack, especially for GPU-heavy nodes. A 72 MW target signals ample runway for large, horizontally scaled training fleets and low-latency HPC fabrics.
If you're running GPU-first pipelines, this kind of capacity supports growth without juggling multiple fragmented sites. For teams consolidating scattered GPU resources, the phased approach helps stage migrations and avoid risky, all-at-once cutovers.
Timeline: what to prepare before June (first 9 MW)
- Procurement lead times: Align server, GPU, and NIC delivery windows with the June slot to avoid stranded power.
- Cluster bootstrap: Prebuild golden images, container base layers, and CI/CD runners to accelerate rack bring-up.
- Networking plan: Lock topology (InfiniBand or RoCE), oversubscription targets, and spine-leaf bill of materials.
- Data staging: Preposition training data and checkpoints to reduce day-one egress/ingress bottlenecks.
Networking and storage: design decisions to lock now
AI performance hinges on low-latency fabrics and predictable east-west throughput. Commit to your fabric early (HDR/NDR InfiniBand or 100/200/400G Ethernet with RoCE) and map pod sizes to match failure domains and job scheduling.
For storage, define a tiered approach: high-IOPS NVMe for hot datasets and checkpoints, plus scalable object storage for cold data. Validate bandwidth targets against your largest multi-node jobs, not averages.
Observability, SRE, and reliability
- Telemetry: Standardize node-exporters, DC-side power/thermal feeds, and job-level metrics for queue health.
- Runbooks: Cover cluster bring-up, fabric faults, GPU health drift, and thermal throttling conditions.
- Capacity policy: Set per-tenant quotas and preemption rules to keep critical training on track during phase transitions.
Colocation and compliance checklist
- Cross-connects: Reserve ports and wavelengths early; confirm routes to your primary cloud regions and WAN hubs.
- Data governance: Map dataset residency and retention to audit requirements before first ingest.
- Resilience: If single-site, define RPO/RTO and simulate failover for schedulers, metadata, and checkpointing.
Who's involved
The facility is being developed by Aligned Data Centers for Lambda. As additional 9 MW blocks come online, teams can scale in predictable steps without re-architecting clusters each time.
Action items if you plan to deploy here
- Book capacity windows aligned to hardware arrivals; avoid mismatches between rack power and GPU delivery.
- Freeze your fabric design and cabling standards; pre-stage configs to cut turn-up time.
- Pre-test storage performance against your largest distributed jobs; tune stripe sizes and caching policies.
- Automate node onboarding: firmware baselines, GPU drivers, Kubernetes/Slurm agents, and security hardening.
- Coordinate with security early for key management, secrets rotation, and per-tenant isolation.
Level up team readiness
If you're responsible for provisioning, scaling, or operating AI infrastructure, this rollout is a strong prompt to tighten automation, observability, and operational playbooks. For practical upskilling on infra automation and server operations, see the AI Learning Path for Systems Administrators.
Your membership also unlocks: