Comparison to Baselines
A dog is jumping into a river. → A horse is jumping into a river.
Input video
|
DreamMotion w/ Zeroscope
|
Tune-A-Video
|
ControlVideo
|
Masked Regions
|
Control-A-Video
|
Gen-1
|
TokenFlow
|
A seagull is walking. → A duck is walking on the mud.
Input video
|
DreamMotion w/ Zeroscope
|
Tune-A-Video
|
ControlVideo
|
Masked Regions
|
Control-A-Video
|
Gen-1
|
TokenFlow
|
A car is driving on the road. → A lamborghini is walking is driving on the road, on sunset.
Input video
|
DreamMotion w/ Zeroscope
|
Tune-A-Video
|
ControlVideo
|
Masked Regions
|
Control-A-Video
|
Gen-1
|
TokenFlow
|
A man is skateboarding. → A firefighter is skateboarding.
Input video
|
Masked Regions
|
|
DreamMotion w/ Show-1
|
DDIM inversion + Word swap
|
VMC
|
Cars are running on the bridge. → Buses are running on the bridge.
Input video
|
Masked Regions
|
|
DreamMotion w/ Show-1
|
DDIM inversion + Word swap
|
VMC
|
Additional Comparisons to Baselines (Video-P2P[1], DMT[2])
A seagull is walking. → A flamingo is walking on the grass.
Input video
|
|
DreamMotion w/ Zeroscope
|
Video-P2P
|
DMT
|
A car is driving on the road. → A lamborghini is walking is driving on the road, on sunset.
Input video
|
|
DreamMotion w/ Zeroscope
|
Video-P2P
|
DMT
|
A man is walking a dog on the road. → A child is walking a pig on the road.
Input video
|
|
DreamMotion w/ Zeroscope
|
Video-P2P
|
DMT
|
A man is walking a dog on the road. → A woman is walking a tiger on the road.
Input video
|
|
DreamMotion w/ Zeroscope
|
Video-P2P
|
DMT
|
a car is driving on the road under the sky. → A school bus is driving on the road under aurora.
Input video
|
|
DreamMotion w/ Zeroscope
|
Video-P2P
|
DMT
|
a car is driving on the road under the sky. → A truck is driving on the road under fireworks.
Input video
|
|
DreamMotion w/ Zeroscope
|
Video-P2P
|
DMT
|
[1] Liu, Shaoteng, et al. "Video-p2p: Video editing with cross-attention control." CVPR 2024.
[2] Yatim, Danah, et al. "Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer." CVPR 2024.
Additional Video Style Transfer Results
Input video
|
|
→ Pixel art
|
→ Watercolor painting
|
• Sterling, Spencer. Zeroscope. https://huggingface.co/cerspense/zeroscope_v2_576w (2023).
• Zhang, David Junhao, et al. "Show-1: Marrying pixel and latent diffusion models for text-to-video generation." arXiv preprint arXiv:2309.15818 (2023).
• Wu, Jay Zhangjie, et al. "Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation." ICCV 2023.
• Zhang, Yabo, et al. "Controlvideo: Training-free controllable text-to-video generation." ICLR 2024.
• Chen, Weifeng, et al. "Control-a-video: Controllable text-to-video generation with diffusion models." arXiv preprint arXiv:2305.13840 (2023).
• Esser, Patrick, et al. "Structure and content-guided video synthesis with diffusion models." ICCV 2023.
• Geyer, Michal, et al. "Tokenflow: Consistent diffusion features for consistent video editing." ICLR 2024.
• Jeong, Hyeonho, et al. "VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models." CVPR 2024.