Huawei's Silent Move: Kunpeng + Ascend, Building AI Infrastructure Without the Hype
Honestly, when I saw Huawei’s demo at their hardware summit, my first thought was: why isn’t this trending?
I’m not talking about some clickbait headline like “Huawei Creates China’s Most Powerful AI Chip.” What I find genuinely interesting is what they demonstrated: a Kunpeng processor + Ascend chip combo that’s running both LLM training and inference end-to-end—using their own hardware to run open-source Llama 4 and DeepSeek models.
So what’s the significance here?
Let’s look at the numbers first: The Kunpeng 920 processor now supports up to 128 cores, and the Ascend 910B chip delivers 256 TFLOPS (FP16) per card. What does that mean? The A100 hits 312 TFLOPS—so the gap is within 20%.
20% might sound like a lot, but here’s the thing: Huawei’s entire solution—including chips, servers, and operations—costs about 40% less than buying NVIDIA’s setup. Plus, no waiting in line for stock, no dealing with U.S. export restrictions.
My personal take? This is exactly what many Chinese companies have been waiting for.
You could call it “domestic substitution,” and you’d be partially right. But Huawei didn’t frame this with grand narratives about “self-reliance” or shout slogans about “breaking through blockades.” They just laid out the technical data matter-of-factly: This solution can run large models, the performance is close to A100, and it’s 40% cheaper. Want to give it a try?
This reminds me of 2024, when Huawei first pushed the Ascend chip. Many online voices said “the performance isn’t there” or “the ecosystem isn’t ready.” I was skeptical too—after all, AI chips aren’t just about hardware. NVIDIA’s CUDA ecosystem took over a decade to build.
But looking at it now, Huawei’s strategy is quite smart: Don’t fight CUDA head-on. Instead, focus on serving domestic LLM vendors.
What does that mean? Instead of trying to steal CUDA’s developer ecosystem, Huawei focuses on serving Chinese LLM companies—DeepSeek, Qwen, Ernie Bot, and others. You use our chips, we provide full-stack support from drivers to frameworks. Training and inference, all in one go.
This approach is already showing results. Currently, at least 15 Chinese LLM companies (including several top-tier players) are using Huawei’s Ascend chips for training and inference. Huawei hasn’t officially released the names, but from what I know, DeepSeek already has some inference clusters running on Ascend 910B.
Here’s what’s interesting: Huawei isn’t trying to be “China’s NVIDIA.” It’s positioning itself as “the IBM of the LLM era.”
NVIDIA sells general-purpose GPUs—you can use them for gaming, rendering, or AI training. But Huawei’s Kunpeng + Ascend combo was designed specifically for AI computing from day one. Kunpeng handles general computing (data preprocessing, CPU-bound inference), while Ascend handles acceleration (training and inference GPU work).
The optimization potential for this architecture is much higher than general-purpose GPUs because you know exactly what users need, allowing targeted optimization.
For example: Huawei demoed inference of a 70B-parameter Llama 4 model, achieving 28 tokens per second on Ascend 910B. On A100, that’s roughly 35 tokens per second—about a 25% gap. But if you factor in the 40% cost difference, the price-performance ratio is actually favorable.
Of course, I’m not saying Huawei’s solution is perfect. The biggest challenge remains ecosystem: CUDA has over a decade of accumulation, with millions of developers worldwide. Huawei’s CANN (Compute Architecture for Neural Networks) ecosystem is still small.
But here’s the smart part: Huawei isn’t trying to get developers to migrate from CUDA to CANN. Instead, they provide pre-trained models and inference frameworks directly. No code changes needed—just deploy.
That significantly lowers the barrier.
Honestly, I’ve always thought the biggest problem with domestic AI chips isn’t performance—it’s usability. Being slightly slower is fine, as long as developers don’t have to struggle to use it. That’s how you capture market share.
Huawei’s strategy is heading in the right direction.
One final detail: The models Huawei used in their demo were Llama 4 and DeepSeek—both open-source. What does this mean? It means Huawei’s solution is completely open-source friendly. You can use your own models, your own data, your own hardware, and keep the entire AI training-inference chain under your control.
For companies sensitive about data privacy (finance, healthcare, government), this is a significant selling point.
I don’t know how far Huawei’s solution will ultimately go, but at least this time, I didn’t see any grandiose claims about “disrupting NVIDIA.” Just a very practical message: “We’re offering an alternative.”
Honestly, that kind of low-key pragmatism is what makes me think they might actually be onto something.