Kimi K2.6 Drops at Midnight: Code Performance Matches GPT-5.4 — and It's Open Source

At 11 PM last night, Moonshot AI dropped a tweet announcing Kimi K2.6 — officially open source.

No press conference. No influencer testimonials. Just a tweet and a GitHub repo.

I almost missed it. But the benchmark numbers in the official release were worth pulling an all-nighter for.

K2.6 scored 54.0% on Humanity’s Last Exam — a PhD-level benchmark that’s considered one of the hardest in AI. On the DeepSearchQA test measuring agent深度检索能力, it hit 92.5%, crushing GPT-5.4 and Claude Opus 4.6.

But what caught my attention wasn’t the benchmarks.

It was the “300 Agent cluster coordination” capability.

My first thought: wait, 300? Like, simultaneously?

Best way to understand it: imagine breaking one big task into 300 smaller pieces, assigning each piece to a separate Agent, then merging the results. Compared to “one Agent doing your job,” this is a different beast entirely.

The upside is obvious: faster execution, higher fault tolerance, finer task decomposition.

But here’s the catch:

One: coordinating 300 Agents is like managing 300 employees without a proper management system. Chaos is inevitable without the right infrastructure.

Two: cost. 300 Agents means roughly 300x token consumption. The official API pricing went up 58% — now it makes sense.

Three: “open source” needs scrutiny. Did they open the model weights, or the training code, training data, and evaluation methodology too? Still waiting to find out.

That said, the fact that a Chinese model can go head-to-head with GPT-5.4 on code is historic in itself.

Just months ago, domestic models were getting trounced by GPT-4 on coding tasks. Now, at least in this dimension, we’re punching at the same weight class.

As for the 300-Agent cluster — I genuinely want to test this. First day I get an API key, you’ll see a hands-on review.

My honest take: Yang Zhilin just handed in a solid report card.

Will it ace the test? Time will tell.

K2.6 is already live on kimi.com, API included. Go try it. And come back to share your results in the comments.