NVIDIA Releases Open Blueprint for Physical AI Data Generation at Scale
NVIDIA announced the Physical AI Data Factory Blueprint on March 16, an open reference architecture designed to automate how training data is generated, processed and validated for robotics, autonomous vehicles and vision AI agents. The blueprint reduces the time and cost required to train physical AI systems by converting limited real-world data into large, diverse datasets including rare edge cases.
The architecture integrates NVIDIA Cosmos foundation models with cloud infrastructure from Microsoft Azure and Nebius. Eight companies-including Uber, Skild AI, Teradyne Robotics and FieldAI-are already using the blueprint to accelerate development.
How the Blueprint Works
The system moves teams from raw data to model-ready training sets through three automated steps:
- Curate and Search: NVIDIA Cosmos Curator processes, refines and annotates large-scale datasets from real-world and synthetic sources.
- Augment and Multiply: Cosmos Transfer expands and diversifies curated data, multiplying inputs across different environments and lighting conditions to capture long-tail scenarios.
- Evaluate and Validate: NVIDIA Cosmos Evaluator automatically scores and filters generated data to ensure physical accuracy before training.
NVIDIA is using the blueprint to train Alpamayo, an open vision language action model for autonomous driving. Skild AI applies it to general-purpose robot foundation models, while Uber uses it for autonomous vehicle development.
Agent-Driven Orchestration
Many robotics teams lack the infrastructure to manage data generation at scale. NVIDIA OSMO, an open source orchestration framework, unifies workflows across compute environments and integrates with coding agents like Claude Code and OpenAI Codex.
This enables AI-native operations where agents proactively manage resources and resolve bottlenecks, reducing manual tasks for developers.
Cloud Integration
Microsoft Azure is integrating the blueprint into an open physical AI toolchain available on GitHub. The integration includes Azure IoT Operations, Microsoft Fabric and Real-Time Intelligence services.
FieldAI, Hexagon Robotics, Linker Vision and Teradyne Robotics are testing the Azure toolchain for accelerating data generation and evaluation across perception and mobility pipelines.
Nebius has integrated OSMO into its AI Cloud, combining RTX PRO 6000 Blackwell GPUs with object storage, data management and serverless execution. Milestone Systems, Voxel51 and RoboForce are using the blueprint on Nebius infrastructure for video analytics, autonomous vehicles and industrial robotics.
The blueprint is expected to be available on GitHub in April.
Related: AI Agents & Automation | AI for IT & Development
Your membership also unlocks: