https://omnihuman-lab.github.io/
We propose an end-to-end multimodality-conditioned human video generation framework named OmniHuman, which can generate human videos based on a single human image and motion signals (e.g., audio only, video only, or a combination of audio and video). …
* The site has a lot of examples.
* OmniHuman-1: Taylor swift singing Naruto in Japanese ; and some nostalgia, with French!