NVIDIA and Oracle to Deliver 2.2 Exaflops AI Supercomputing for DOE Research
NVIDIA and Oracle are collaborating with the U.S. Department of Energy to deploy two AI supercomputers at Argonne National Laboratory. The systems - Solstice and Equinox - are projected to deliver a combined 2.2 exaflops of AI performance, according to NVIDIA. The goal: shorten research cycles and scale frontier model development across science, security, and energy use cases.
Solstice will be the flagship system with 100,000 NVIDIA Blackwell GPUs. Equinox, at 10,000 Blackwell GPUs, will add essential capacity and flexibility for diverse workloads. Argonne director Paul K. Kearns will oversee integration into existing research infrastructure.
Systems at a glance
- Solstice: 100,000 NVIDIA Blackwell GPUs for large-scale training and reasoning workloads.
- Equinox: 10,000 NVIDIA Blackwell GPUs to expand access and throughput for research teams.
- Networking: Interconnected with NVIDIA networking for high-throughput data movement and distributed training.
- Software stack: NVIDIA Megatron-Core for frontier model training and NVIDIA TensorRT for high-scale inference.
Why this matters for research programs
According to the announcement, the scale of Solstice and Equinox will support previously impractical experiments - from multi-physics simulators guided by AI to rapid pretraining and domain adaptation for scientific models. Expect shorter iteration loops for hypothesis testing, alongside higher-fidelity inference on complex, multi-modal datasets.
Argonne's deployment is meant to serve national priorities, including healthcare, materials discovery, and energy systems modeling. The DOE is positioning this as a public-private model to accelerate useful outcomes, backed by industry investment and real-world use cases.
Agentic scientists: what changes
The DOE, Argonne, and NVIDIA plan to advance "agentic scientists" - AI systems that can autonomously run experiments, iteratively refine plans, and report results. The intention is to grow R&D productivity and improve returns on public research funding within the decade.
Jensen Huang underscored that AI's strongest canvas is scientific discovery, and DOE leadership echoed the value of pragmatic partnerships. The collaboration was also framed as building on commitments established during the Trump administration.
Practical guidance for lab teams
- Plan for distributed scale: Design training runs with explicit tensor/pipeline parallelism strategies supported by Megatron-Core. Validate scaling laws and checkpoint schedules early.
- Data readiness: Standardize data curation, versioning, and governance. Establish lineage and QA protocols before allocating large GPU hours.
- Inference at scale: Profile models with TensorRT for target modalities (text, vision, multi-modal). Predefine accuracy/latency tradeoffs and evaluation gates.
- Reproducibility: Lock containers and seeds, track compiler flags, and publish experiment manifests. Treat evaluation harnesses as first-class artifacts.
- Safety and compliance: Implement red-teaming and domain-specific risk tests. Align with lab policies for data security and export control from day one.
Access, integration, and collaboration
Paul K. Kearns will oversee integration at Argonne, bringing the new systems into the lab's established research pipelines. The networking design should enable cross-team collaboration, shared datasets, and consistent execution environments for large-scale experiments.
For program leads, this is an opportunity to coordinate shared model baselines, common benchmarks, and pooled evaluation suites. That reduces duplicated effort and speeds up comparative studies across projects.
Public-private model: how it's structured
The DOE's approach opens the door to industry participation, with infrastructure and software stack alignment driving faster deployment. According to the announcement, this model is intended to keep U.S. research competitive while focusing investments on impactful, measurable outcomes.
In practice, expect closer coordination on readiness reviews, service-level objectives for training and inference, and clearer pathways from proof-of-concept to production-scale studies.
What to watch next
- Queue policies and allocation: How compute time will be scheduled across institutions and programs.
- Baseline model releases: Availability of foundation models and domain-tuned checkpoints for reuse.
- Tooling maturity: Updates to Megatron-Core training features and TensorRT for multi-modal workloads.
- Benchmark transparency: Published metrics for energy efficiency, training time, and inference throughput.
According to NVIDIA, the combined Solstice and Equinox capacity is intended to enable frontier-scale experimentation across key scientific domains. For industry partners and national labs alike, the near-term value is clear: faster cycles, bigger hypothesis spaces, and more decisive validation.
Argonne National Laboratory will publish integration updates as deployment progresses. For architecture context, see NVIDIA's overview of the Blackwell platform on nvidia.com.
Upskilling for large-scale AI work
If your team is preparing for distributed training, evaluation at scale, or multi-modal pipelines, consider focused learning tracks. A curated catalog by role and stack is available here: AI courses by job.
Your membership also unlocks: