Alibaba Wan2.7-Video Released: Can Chinese AI Video Generation Finally Compete with Sora?

I spent last night tinkering with Alibaba’s newly released Wan2.7-Video, and honestly, it exceeded my expectations.

Not in a “holy cow this is amazing” way, but more like “wait, Chinese AI video can actually do this?” You know, after getting used to Kling AI, my expectations for domestic video generation were already pretty high. But Wan2.7-Video does have its unique strengths in certain details.

First, let’s talk about its core selling point — multimodal instruction parsing. What does this mean? You can use more complex descriptions to generate video, not just “a cat running” but “an orange cat running on a beach at sunset, shot from a low angle, in slow motion.”

I tried several prompts and found its understanding of “cinematography language” is indeed much stronger than previous versions. For example, when I said “handheld camera shake,” it actually gave that slight jitter effect instead of a perfectly stable shot. This “imperfection” actually makes the video look more realistic.

But hold on, let’s look at the data. I compared generation speed and quality:

  • Generating a 5-second video, Wan2.7-Video takes about 15 seconds (on A100)
  • Kling AI takes about 12 seconds
  • In terms of quality, Wan2.7-Video is slightly more detailed in texture, especially for complex materials like water and fur

However, Wan2.7-Video’s biggest advantage may not be single-point performance, but its integration with Alibaba’s ecosystem. Think about it — Alibaba has Taobao, Youku, DingTalk — all have rigid demand for video generation. E-commerce product videos, marketing short videos, internal corporate training… these are all real application scenarios.

Speaking of application scenarios, this is a case of “Sora stumbled, domestic tools feasted.” OpenAI officially shut down Sora at the end of March, just 6 months after launch. The official statement was “strategic adjustment,” but the industry generally believes commercialization didn’t work out.

This is quite interesting. Sora’s technical prowess is unquestionable, but its problem was — too expensive and too slow. Regular users couldn’t afford it, professional users couldn’t wait for it. In contrast, Kling AI has already achieved $300 million ARR (Annual Recurring Revenue), showing that domestic tools are actually more stable on the commercialization front.

My personal feeling is that in the AI video generation race, technology is only part of it. What’s more important is “who can find paying scenarios first.” From this perspective, players like Alibaba and Kuaishou with their own ecosystems may have advantages over pure AI companies.

Of course, Wan2.7-Video isn’t without flaws. During testing, I found it still produces some strange limb distortions when handling “multi-person interaction” scenes. This is a common problem for all current AI video models — single subjects are fine, but complex interactions tend to fail.

Finally, I want to ask everyone: After Sora shut down, do you think Chinese AI video tools can seize the opportunity to “overtake on the curve”? Or is this just a temporary window, and once OpenAI adjusts its strategy, they’ll come back stronger?

My personal view is — the window is real, but whether it can be seized depends on whether domestic tools can get both “generation quality” and “commercialization” running simultaneously in the next 6 months. The technology gap is narrowing, but the gap in brand recognition and user habits still needs time.