Microsoft AI creates talking deepfakes from single photo

Microsoft AI creates talking deepfakes from single photo
Microsoft Research Asia has released an AI model that can generate realistic, talking deepfake videos from a single still image and an audio track.

The model has been trained on
footage of approximately 6,000 talking faces
from the VoxCeleb2 dataset, and can animate
still images that lip-sync to a supplied
voice track, creating realistic vocal expressions
and natural head movements.

The technology, called
VASA-1, can reportedly generate synced videos at
512x512 pixels at 40 frames per
second without latency.

Photo credit: Microsoft Research Asia