Video-to-Video Translation for Visual Speech Synthesis

05/28/2019
by   Michail C. Doukas, et al.
0

Despite remarkable success in image-to-image translation that celebrates the advancements of generative adversarial networks (GANs), very limited attempts are known for video domain translation. We study the task of video-to-video translation in the context of visual speech generation, where the goal is to transform an input video of any spoken word to an output video of a different word. This is a multi-domain translation, where each word forms a domain of videos uttering this word. Adaptation of the state-of-the-art image-to-image translation model (StarGAN) to this setting falls short with a large vocabulary size. Instead we propose to use character encodings of the words and design a novel character-based GANs architecture for video-to-video translation called Visual Speech GAN (ViSpGAN). We are the first to demonstrate video-to-video translation with a vocabulary of 500 words.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro