← Model index
Interactive simulation world models

NVIDIA SANA-Streaming

SANA-Streaming is a system-algorithm co-designed framework for high-resolution, real-time streaming video editing. A hybrid Diffusion Transformer interleaves efficient GDN blocks with softmax attention for local source alignment, enabling causal video-to-video edits over minute-long streams while preserving motion and non-edited regions. NVIDIA, MIT, THU, NUS, and HKU researchers report 1280×704 output at 24 end-to-end FPS on a single RTX 5090, with the DiT core reaching 58 FPS. A public interactive demo is hosted on Reactor.

TrackedPublic demoUpdated 2026-06-11

Overview

StatusTracked
AccessPublic demo
Released2026
Inputssource video stream, text edit prompt, local edit regions, background replacement prompts, style-transfer prompts
Outputsreal-time edited video stream, 1280×704 streaming output, temporally consistent V2V frames, physical-AI augmented sensor video
Best forreal-time streaming video editing, live broadcast and gaming overlays, style transfer on video streams, autonomous-driving sensor augmentation, robotics egocentric video transformation, minute-length causal V2V, physical-AI synthetic data

Why it matters

SANA-Streaming sits between live world models and classical video editors: it transforms existing video streams in real time with prompt control, temporal consistency, and physical-AI use cases such as autonomous-driving sensor augmentation and egocentric robotics sim-to-real. It expands Roamscape’s Train and Play index beyond generative 3D into streaming V2V editing.

Roamscape use

Tracked in the model index under Train (physical AI) and Play (streaming V2V). Try the public Reactor demo; native Roamscape integration when API access is available.

Strengths

  • 24 FPS end-to-end on a single RTX 5090
  • minute-length causal streaming edits
  • preserves source motion and unedited content
  • hybrid DiT with GDN + softmax attention
  • cycle-reverse regularization for temporal consistency
  • strong physical-AI demo coverage (AV, robotics)
  • public Reactor-hosted demo

Limitations

  • video-to-video editing, not exportable 3D worlds
  • requires a source video stream rather than text-only generation
  • not integrated in Roamscape /live yet
  • hardware targets high-end consumer GPUs (RTX 5090 class)
  • research system — production terms depend on host

Sources

Related models