Tencent introduced a new artificial intelligence (AI) model on Tuesday capable of bringing still portrait images to life through animation. Named HunyuanPortrait, this large language model (LLM) utilizes a diffusion architecture to generate realistic animated videos from a reference image and a guiding video. Developers involved in the project emphasized the model’s ability to capture intricate facial data and spatial movements, enabling a seamless integration of these elements into the static image.
Tencent’s HunyuanPortrait Can Bring Still Portraits to Life
In a recent announcement on X (formerly known as Twitter), Tencent Hunyuan’s official account shared that the HunyuanPortrait model is now accessible to the public. Interested users can download the AI model from Tencent’s GitHub and Hugging Face repositories. Furthermore, a preprint paper describing the model’s specifications is available on arXiv. The model is intended for academic and research purposes, not for commercial applications.
HunyuanPortrait facilitates the creation of lifelike animated videos using a combination of reference images and driving videos. It captures detailed facial data and head movements, effectively translating them onto the still portrait. The company asserts that the synchronization of movements is precise, with even subtle facial expression shifts being accurately reflected.
HunyuanPortrait architecture
Photo Credit: Tencent
On the model’s website, Tencent’s researchers elaborated on the HunyuanPortrait architecture, which is built using the Stable Diffusion framework complemented by a condition control encoder. These pre-trained encoders help decouple motion data from identity within videos. This captured information serves as control signals that are incorporated into the still portrait using a denoising UNet. Tencent claims this approach enhances both spatial accuracy and temporal consistency in the resulting animations.
Tencent asserts that their AI model surpasses existing open-source alternatives in metrics related to temporal consistency and controllability; however, these claims have not undergone independent validation.
This technology could significantly benefit the filmmaking and animation sectors. Traditionally, animators relied on manual keyframing of facial expressions or costly motion capture setups to achieve realistic character animations. With models like HunyuanPortrait, animators can simply provide character designs along with desired movements and expressions, allowing the model to generate the necessary output. Such LLMs hold promise for democratizing high-quality animation, making it more accessible for smaller studios and independent creators.