V-JEPA 2: Advancing AI Understanding of the Physical World
V-JEPA 2 is a new world model trained on video that helps AI agents, including robots, better grasp the physical environment and anticipate how it will react to their actions. This ability to think before acting is a key step toward developing advanced machine intelligence (AMI).
Humans naturally predict how the physical world changes based on our actions or those of others. For example, tossing a tennis ball means it will come down due to gravity. Walking through a crowded space requires adjusting movements to avoid collisions. In hockey, players move to where the puck will be, not just where it is now. This intuition comes from building an internal model through observation, which we use to forecast outcomes of possible actions.
V-JEPA 2 equips AI agents with similar skills, improving their "physical intuition" so they can interact more effectively with their surroundings.
What Are World Models?
World models are AI systems that provide three core functions:
- Understanding: Interpreting the environment and its components.
- Predicting: Anticipating how the environment will change over time or in response to actions.
- Planning: Deciding on sequences of actions based on predicted outcomes.
Building on the original V-JEPA model released last year, V-JEPA 2 improves the first two functions—understanding and predicting—allowing robots to handle unfamiliar objects and new environments during tasks.
How V-JEPA 2 Works
Trained on video data, V-JEPA 2 has learned patterns about how people interact with objects, how objects move, and how they affect each other. This training enables robots to perform tasks such as reaching for objects, picking them up, and placing them elsewhere.
Tests in lab settings show that robots equipped with V-JEPA 2 can adapt to new situations without prior exposure, a crucial feature for real-world applications.
New Benchmarks for AI Research
Alongside releasing V-JEPA 2, three new benchmarks are now available to help researchers evaluate how well their models learn and reason about the physical world from video. These benchmarks provide a standard for measuring progress in AI’s ability to understand and predict physical interactions.
Sharing these tools aims to accelerate research and development, pushing AI systems toward greater usefulness and reliability in everyday tasks.
For those interested in deepening their knowledge about AI and related technologies, Complete AI Training offers a variety of courses that cover current AI methods and tools.
Your membership also unlocks: