VASA-1 takes in a single portrait photo and an audio file and converts it into a hyper realistic talking face video complete with lip sync ... to audio-driven animation is also similar to a ...