Alibaba upgrades HappyHorse video AI, cuts generation costs by 25%

  • Version 1.1 improves motion realism, character consistency and audio generation while addressing user complaints about stiff movements
  • The updated model supports up to nine reference images and lowers 1080p generation costs as competition in AI video intensifies

Alibaba on June 22 unveiled HappyHorse 1.1, the latest version of its video generation model, introducing upgrades across motion quality, character consistency, instruction following, visual realism and audio capabilities.

The most significant improvement centers on motion generation. The company said the update addresses user feedback that the previous version produced movements that appeared slow or lacked energy.

HappyHorse 1.1 enhances motion modeling and temporal consistency, resulting in smoother and more convincing action sequences in scenarios such as dancing and combat.

Character consistency

The upgrade is aimed at reducing common issues in AI-generated video, including motion distortion and ghosting effects that can occur during fast-moving scenes.

Character consistency has also been strengthened. Users can now upload up to nine reference images simultaneously, allowing the model to combine product details, branding elements and environmental settings while maintaining stable character appearances throughout a video.

The feature is designed for applications such as short-form dramas, livestream e-commerce and advertising campaigns, where keeping characters visually consistent remains a key challenge for AI-generated content.

Alibaba also said the new version improves image quality by reducing the overly glossy skin textures and excessive sharpening effects that some users reported in earlier releases.

Upgraded audio output

The model now preserves details such as pores, wrinkles and skin imperfections, producing a more natural appearance suited to commercial advertising and narrative content.

Audio generation has also been upgraded. Dialogue can now adjust pacing, pauses and tone according to scene context and emotional cues, while users can specify background sounds and environmental effects directly through prompts.

All images courtesy of Alibaba

The technical specifications remain unchanged from version 1.0. Videos can be generated at lengths ranging from three to 15 seconds and support both 720p and 1080p resolutions, as well as flexible aspect ratios.

Lower pricing

Alibaba also lowered pricing for the higher-resolution option. Generation costs for 1080p video have been reduced from 1.6 yuan per second to 1.2 yuan per second, a 25% decrease.

Since its initial release, HappyHorse has been adopted across a range of content-production use cases, including short dramas, e-commerce advertising, brand marketing and game cinematics.

The latest version is now available through the HappyHorse website as well as Alibaba Cloud’s Bailian platform and Qwen Cloud.