NVIDIA at NeurIPS: Open Models and Tools for Digital and Physical AI
NVIDIA announced a wave of open models, datasets and tools at NeurIPS that aim straight at developer needs - from autonomous driving and robotics to speech, safety and reinforcement learning. A new Open Index by Artificial Analysis also recognized the company's Nemotron stack for openness and transparency across licensing, data and documentation.
Below is a practical breakdown of what matters, why it matters and where to start.
Physical AI: DRIVE Alpamayo-R1 and the Cosmos Stack
DRIVE Alpamayo-R1 (AR1): Reasoning VLA for Autonomous Driving
AR1 is an open, industry-scale reasoning vision-language-action model for mobility. It blends chain-of-thought reasoning with path planning to handle nuanced road situations - from pedestrian-dense intersections to blocked lanes and tricky merges.
The model evaluates multiple trajectories, explains its decisions with reasoning traces and plans the next move using context. Post-training with reinforcement learning significantly boosts reasoning over the base model.
- Open foundation: Built on NVIDIA Cosmos Reason, available for non-commercial research.
- Access: Code on GitHub and Hugging Face; subset of training/eval data in NVIDIA Physical AI Open Datasets.
- Evaluation: AlpaSim released as an open framework to test AR1.
Cosmos Cookbook + Example Workflows
The Cosmos Cookbook gives step-by-step recipes for building physical AI with data curation, synthetic generation and evaluation. If you're building AV, robotics or simulation tooling, this is a useful starting point.
- LidarGen: Generates lidar data for AV simulation.
- Omniverse NuRec Fixer: Uses Cosmos Predict to clean artifacts in neurally reconstructed scenes (e.g., blur, holes, noisy novel views).
- Cosmos Policy: Turns large pretrained video models into effective robot policies.
- ProtoMotions3: Open, GPU-accelerated training for digital humans and humanoids with realistic scenes via Cosmos WFMs; integrates with NVIDIA Newton and Isaac Lab.
Policy models train in Isaac Lab and Isaac Sim, and their data can post-train NVIDIA GR00T N models. Partners using Cosmos WFMs include Voxel51 (contributing recipes), 1X, Figure AI, Foretellix, Gatik, Oxa, PlusAI and X-Humanoid. ETH Zurich researchers are showcasing cohesive 3D scene creation using Cosmos models at NeurIPS.
Digital AI: Nemotron + NeMo Additions
Speech, Safety and Synthetic Data
- MultiTalker Parakeet: Streaming ASR that handles multiple speakers, even in overlapped speech.
- Sortformer: Real-time diarization that separates speakers accurately.
- Nemotron Content Safety Reasoning: Policy-aware reasoning model for AI safety across domains.
- Nemotron Content Safety Audio Dataset: Synthetic dataset to train models to detect unsafe audio content across text and audio.
- NeMo Gym: Open-source library for building RL environments for LLM training, including RLVR-ready environments.
- NeMo Data Designer Library (Apache 2.0): End-to-end toolkit for synthetic data generation, validation and refinement for domain-specific model customization and evaluation.
Ecosystem partners using Nemotron and NeMo for secure, specialized agentic AI include CrowdStrike, Palantir and ServiceNow.
Open Recognition: Nemotron Rated Highly for Openness
Artificial Analysis' Open Index ranks the Nemotron family among the most open stacks for frontier AI development. Criteria include permissive licensing, transparent data practices and detailed technical documentation.
Research Highlights at NeurIPS
- Audio Flamingo 3: Fully open large audio language model that reasons across speech, sound and music, handling audio segments up to 10 minutes with strong results on 20+ benchmarks.
- Minitron-SSM: Group-aware SSM pruning that compresses hybrid models; example: Nemotron-H 8B pruned/distilled to 4B with higher accuracy than peers and 2x faster inference.
- Jet-Nemotron: Cost-efficient post-training pipeline delivering hybrid architectures that meet or beat full-attention baselines with higher throughput.
- Nemotron-Flash: Small language model architecture optimized for real-world latency, not just parameter count, with strong speed and accuracy.
- ProRL: Prolonged reinforcement learning that extends training duration to improve reasoning over base models consistently.
NeurIPS runs through Sunday, Dec. 7, in San Diego. Explore the conference schedule and sessions on the official site.
Why This Matters for Engineers
- AV and robotics teams: AR1's reasoning traces plus AlpaSim accelerate test cycles and failure analysis. Cosmos Cookbook shortens the path from data to working policy.
- Speech and analytics teams: MultiTalker Parakeet and Sortformer help build reliable multi-speaker apps for meetings, contact centers and collaboration tools.
- Safety and compliance: Nemotron Content Safety Reasoning and the audio dataset help standardize guardrails across text and audio.
- Training and tuning: NeMo Gym and Data Designer streamline RL environments and synthetic data pipelines for domain adaptation.
- Infra efficiency: Research on pruning, hybrid architectures and latency-first SLMs points to lower-cost deployments with faster throughput.
Get Started
- Pull the AR1 repo on GitHub and test with AlpaSim; evaluate reasoning traces on challenging scenes before domain adaptation.
- Use Cosmos Cookbook recipes to build your synthetic data loop; plug into Isaac Lab/Sim and iterate policies fast.
- Prototype diarization with Sortformer and streaming ASR with MultiTalker Parakeet; benchmark on your noisy, overlapped audio.
- Add NeMo Gym environments to your RL pipeline and use Data Designer to generate/refine synthetic sets for your vertical.
- Plan a safety review using Nemotron Content Safety Reasoning and the audio dataset across your text/audio endpoints.
Links
Level up your team's AI skills
If you're building with NVIDIA's stack or evaluating alternatives, upskilling speeds up adoption. Browse practical, role-based tracks here:
See notice regarding software product information.
Your membership also unlocks: