NSF launches national data systems and selects key datasets to boost US AI research and innovation
The NSF launches the Integrated Data Systems and Services program and integrates 10 datasets into the NAIRR Pilot to boost AI research and workforce development. These efforts enhance national-scale data access and AI literacy.

August 28, 2025
NSF Advances National AI Infrastructure with New Data Systems and Resources
The U.S. National Science Foundation (NSF) has announced two key initiatives to boost America’s artificial intelligence capabilities. First, the launch of the Integrated Data Systems and Services (IDSS) program aims to create national-scale data systems that enable seamless access and sharing of scientific data. Second, NSF has selected 10 datasets to be integrated into the National Artificial Intelligence Research Resource (NAIRR) Pilot, supporting AI research, education, and workforce development.
Building a National Integrated Data Infrastructure with NSF IDSS
The IDSS program addresses a critical gap by funding the development and operation of large-scale systems that allow researchers across the U.S. to access, utilize, and share scientific data efficiently. Such infrastructure is vital for accelerating innovation and maintaining competitiveness in AI and other scientific fields.
Currently, NSF lacks dedicated programs for operational national-scale data systems. IDSS will fill this void by supporting platforms that serve research and education communities while interoperating with other federal science and data infrastructure efforts.
These systems will not only connect data but also integrate computing, instruments, and software. This integration will make AI development, data analysis, and scientific discovery faster, more reliable, and reproducible.
IDSS supports three types of projects:
- Creation of new integrative data systems designed for national-scale needs.
- Scaling existing prototypes and regional systems to achieve national-level impact.
- Planning grants to develop ideas for future IDSS systems and services.
Additionally, the program will invest in workforce development to ensure skilled management and operation of these systems, strengthening the U.S. cyberinfrastructure for AI and scientific progress.
NAIRR Pilot: Selected Datasets to Support AI Literacy and Innovation
Alongside IDSS, NSF announced the selection of 10 datasets to be incorporated into the NAIRR Pilot. These datasets were chosen through a competitive process managed by NSF in partnership with 12 federal agencies. The goal is to support AI skill development across diverse learning environments and help grow the country’s AI-literate workforce.
The selected datasets come from respected institutions and cover a variety of domains:
- AI4Shipwrecks (University of Michigan)
- Turbulence Database (Johns Hopkins University)
- Cell Painting Gallery (Broad Institute)
- FathomNet (Monterey Bay Aquarium Research Institute)
- PatchDB (George Mason University)
- Phase-Field Fracture Simulation (Johns Hopkins University)
- SecureChain (Purdue University)
- Microbiome Preterm Birth DREAM Challenge Dataset (University of California, San Francisco)
- Industry Documents Library (University of California, San Francisco)
- OpenTopography (UC San Diego, Arizona State University, and the Earthscope Consortium)
These datasets span topics such as lidar-based terrain mapping, microbiome research, and software supply chain graphs. Several will integrate with NAIRR partner platforms, enhancing their utility for AI training and research.
High-quality, curated datasets like these enable the development of AI models targeted at specific scientific challenges and domains. Many of these datasets will be more deeply embedded into the NAIRR Pilot in the coming weeks.
“Data infrastructure and access to high-quality datasets are critical components of a thriving AI innovation ecosystem,” said Katie Antypas, director of the NSF Office of Advanced Cyberinfrastructure. “These efforts will sharpen America’s competitive edge and lay the foundation for leadership in science and innovation.”
For researchers and professionals looking to deepen their AI skills and knowledge, exploring the latest AI courses can provide practical insights and training aligned with these national initiatives.