AI data centres as grid-interactive assets
Published: 05 December 2025
AI demand is pushing electricity systems to their limits. Interconnect queues are long, upgrades are expensive, and communities end up paying for new wires and substations. There's a faster path: make data centres flexible so they can support the grid instead of stressing it.
A recent field test showed what this looks like in practice. On a 256-GPU cluster in a hyperscale facility in Phoenix, Arizona, software controls cut the cluster's electrical draw by 25% for three hours during peak demand, while meeting stated quality-of-service guarantees. No batteries. No new hardware. Just smarter orchestration.
What was tested
The system coordinated AI workloads in response to live grid signals. It relied on workload tagging, GPU wattage caps via DVFS, controlled job start/stop, and safe checkpoints. During grid events, it reduced the site's instantaneous draw without breaking model training or latency targets.
Key takeaway: data centres can act like flexible grid resources and still deliver throughput. That means better reliability for utilities and lower bills for operators stuck with high peak charges.
How the orchestration works
- Signal intake: ingest utility or market signals (peak alerts, demand response calls, pricing, or telemetry).
- Workload policy: classify jobs by flexibility (latency-sensitive inference vs. elastic training) and set guardrails.
- Actuation: apply GPU wattage caps (DVFS), pause/resume or defer jobs, and manage admission control.
- Feedback: track cluster-level kW, per-GPU telemetry, and job throughput to refine controls in real time.
A simulator estimated expected draw and throughput under different caps, then the controller enforced safe limits. Results aligned closely with measured draw during events, giving operators confidence to scale the approach.
Why this matters for IT, engineering, and operations
- Cut peak charges and earn demand response revenue with software you can test in days, not months.
- Make interconnection easier by proving you can stay within feeder limits under stress.
- Hold SLAs by tagging workloads and using checkpointing, so flexible jobs carry the load during events.
What the figures showed (in plain terms)
- Utility events: sustained 25% load reduction for three hours at a 256-GPU scale while meeting QoS.
- Historical replay: re-enactment of a 2020 California stress event showed the cluster could have helped stabilize demand.
- Simulator vs. meters: close alignment between predicted and measured draw during capped operation.
- Throughput under caps: training and inference kept acceptable throughput within defined limits under GPU wattage caps.
Implementation checklist
- Inventory controls: confirm per-GPU DVFS wattage caps and telemetry (e.g., via vendor tools and APIs).
- Tag workloads: define classes (latency-critical, interactive, elastic training, batch) and minimum performance floors.
- Add safe checkpoints: ensure resumability for long-running training and batch jobs.
- Wire into your scheduler: integrate caps and pause/resume with Slurm, Kubernetes, Ray, or your job service.
- Unify metering: measure site-level kW and per-GPU draw; store time series for audits and settlement.
- Start small: test 10-20% cluster caps for 60-120 minutes; verify SLAs and iterate.
Which grid programs to target
- Peak events and demand response: 1-4 hour windows, day-ahead or day-of calls.
- Coincident peak shaving: reduce load during your utility's system peak to cut annual demand charges.
- Ancillary services: fast up/down flexibility if you can ramp quickly with verified telemetry and controls. See ERCOT's program overview for context: ERCOT ancillary services study.
Practical guardrails
- QoS enforcement: never cap latency-critical services below a tested floor.
- Checkpoint budgets: set a minimum window between checkpoints to limit overhead.
- Fairness: rotate who gets capped, or discount those jobs, to keep users on board.
- Thermals: validate cooling response to step changes in draw to avoid hotspots.
- Auditability: time-sync logs, meters, and job states for settlement and compliance.
Starter reference architecture
- Signal Adapter: utility/market APIs, pricing feeds, and on-prem meter streams.
- Decision Engine: policies for caps, job deferral, and SLA floors; simulator for "what-if."
- Actuator: DVFS cap service, cluster scheduler hooks, checkpoint/resume controller.
- Telemetry & Store: per-GPU metrics, site kW, and job throughput; dashboards and alerts.
Data and code
The dataset includes DVFS control sweeps and time-series draw from utility experiments, plus the simulator outputs used to match measured draw. The repo also provides Python code and a Docker image to apply wattage caps, job start/stop, and forced checkpoints to LLM Foundry workloads, along with pseudocode for the orchestration algorithms.
Explore the artefacts and code: GitHub: emerald-ai-demo-may-2025
What to do next
- Run a tabletop test with your utility: define event triggers, duration, telemetry, and settlement method.
- Pilot a 10-20% load shed on a non-critical cluster slice for 60-180 minutes.
- Codify policies: workload classes, SLA floors, and who gets capped when.
- Move to production: integrate with billing, metering, and incident response.
Upskill your team
If you're standing up MLOps and infra practices to support this kind of flexibility, curated training can speed up adoption. See role-based options here: Courses by job or browse the latest programs: Latest AI courses.
Bottom line
AI data centres can help stabilize the grid and cut costs with software controls that shape electrical draw in real time. The Phoenix field test shows it's viable at scale with today's GPUs and schedulers. Start small, measure everything, and turn flexibility into an operational advantage.
Your membership also unlocks: