Baidu’s MuseStreamer AI Video Generation Model Takes on Google’s Veo 3 With Native Audio Support: Report

Alongside MuseStreamer, Baidu reportedly also launched its video creation platform, HuiXiang.

Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 4 July 2025 18:05 IST
Baidu’s MuseStreamer AI Video Generation Model Takes on Google’s Veo 3 With Native Audio Support: Report

Photo Credit: Reuters

The MuseStreamer AI model can generate videos in up to 1080p resolution

Highlights
  • MuseStreamer is said to generate videos with Chinese dialogues
  • It reportedly scored 89.38 percent on the VBench I2V benchmark
  • MuseSteamer allows users to generate a 10-second-long video
Baidu reportedly released a new artificial intelligence (AI) video generation model on Wednesday. As per the report, the MuseStreamer AI model can also integrate Chinese audio in the generated videos, making it the second such model after Google's Veo 3. The tech giant claims it to be the world's first AI model with native Chinese audio generation support. Alongside the introduction of the large language model (LLM), the company reportedly also launched a new video content creation platform dubbed HuiXiang. Notably, neither MuseStreamer nor HuiXiang is currently available outside of China.

Baidu's MuseStreamer Can Reportedly Generate Chinese Audio

The world of AI video generation model has evolved significantly in the last two years. We have moved from models that struggled to generate people with a fixed number of fingers to LLMs which can now accurately depict realistic physics and motion. However, one area most AI players have refrained from entering was videos that also supported audio natively.

At Google I/O 2025, the tech giant became the first company to offer this capability with Veo 3, which immediately became talk of the town, leaving its biggest rival, OpenAI's Sora, behind. The Mountain View-based tech giant recently expanded Veo 3 in all the 154 countries where the Gemini app is available, highlighting the company's aggressive push for this tool.

However, according to a Tech in Asia report (via AI Base), Chinese tech giant Baidu has also entered the race with its MuseStream AI model. It is said to generate videos with Chinese audio, and the only model with the capability to do so. Notably, Veo 3 can only generate audio in the English language.

MuseStreamer can reportedly not only generate dialogues that are synced with the videos, it can also add sound effects and ambient noises in the videos. Baidu is said to have claimed that the model achieved a score of 89.38 percent on the VBench I2V benchmark, ranking at the top. The tech giant is pitching the LLM as a content creation tool for consumers.

Alongside the AI model, Baidu has reportedly also launched a new video content platform dubbed HuiXiang. HuiXiang is said to serve as the front-end for the AI model, where users can share prompts and generate videos. The platform currently supports 10-second-long video generations at 1080p resolution, the report stated. In comparison, Veo 3 can generate only eight-second-long videos. There is no clarity over the default aspect ratio of the video, and if users can generate videos in different aspect ratios.

Further reading: Baidu, MuseStreamer, AI, Artificial Intelligence, AI Video, China
Akash Dutta
Akash Dutta
