Physical AI / embodied world models

Meta V-JEPA 2

V-JEPA 2 is Meta’s video-based world model for understanding, predicting, and planning in physical environments. It is oriented toward robot reasoning and physical intuition rather than generating walkable 3D worlds for creators.

ResearchPaper / researchUpdated 2026-06-09

Back to index Primary source

Overview

Status	Research
Access	Paper / research
Released	2025
Inputs	video, image goals, robot observations
Outputs	latent predictions, planning signals, robot action support
Best for	physical reasoning, robot planning, video understanding, embodied AI research

Why it matters

V-JEPA 2 helps define the embodied-AI side of the world model category: predicting outcomes, planning actions, and understanding physical dynamics.

Roamscape use

Tracked as a scientific world model reference for physical AI and embodied reasoning.

Strengths

physical-world prediction
planning orientation
robotics relevance
strong research framing

Limitations

not a 3D world generator
not a consumer creator tool
outputs are not exportable worlds

Sources

Meta — V-JEPA 2 announcement InfoQ — V-JEPA 2 coverage

Related models

NVIDIA · Cosmos

World foundation models for physical AI, robotics, and synthetic data.