Google's Gemma 4: 31B Parameters Match Models 20x Larger—And Runs on Your Phone

Yesterday on Twitter, I saw something that made me pause: Google released Gemma 4, and the 31B parameter version matches models with 20x the parameters.

My first reaction: “That’s impossible.” Then I checked Arena AI’s leaderboard.

Holy crap, they weren’t kidding. The 31B Dense version ranks #3 on the open-source list, behind only two much larger models.

Here’s what’s interesting: we used to talk about open-source models as “good enough, don’t expect them to compete with closed-source.” Now? Open-source is starting to win.

Four Sizes, From Phone to Server

Gemma 4 launched with four variants: E2B (2.3B), E4B (4.5B), 26B MoE, and 31B Dense.

My take: Google actually wants to spread the open-source ecosystem, not just flex technical muscle.

The 2.3B and 4.5B models run offline on Pixel phones, Raspberry Pi, and NVIDIA Jetson Orin Nano. Translation: you can genuinely put AI models in your phone, no internet, no cloud, pure local inference.

And they support real-time speech understanding with near-zero latency.

Two years ago, we were debating “can models run on phones?” Now Google’s giving a straight answer: not only can they run, they run well.

Native Multimodal, Not a Patch Job

Another thing I appreciate: Gemma 4 natively supports image, video, and audio input.

Key word: “native.” Not “add a vision encoder as an afterthought.”

Think of it like building a house. You plan from the foundation how many people will live there, how many rooms you need—not build first and add partitions later.

Native multimodal means the model genuinely understands relationships between modalities, not just awkwardly stitching “seeing” and “saying” together. For developers, it’s also simpler—no need to build your own bridges.

256K Context + 140+ Languages

256K context window means processing hundreds of thousands of characters or hours of video in one go.

140+ language support—I actually verified this one. Mainstream languages like Chinese, English, Japanese, Korean are covered, plus plenty of smaller ones.

Together, these two features make Gemma 4 incredibly practical. Many open-source models either have short context or limited language support. Gemma 4 solves both pain points.

Apache 2.0: Actually Usable for Business

Let me highlight this one. Many “open-source” models come with strings attached—commercial use requires payment, code modifications must be open-sourced, making developers’ lives complicated.

Gemma 4 uses Apache 2.0 license. You can use it for commercial products, modify code without open-sourcing, just keep the copyright notice.

That’s what I call “real open source.” The “open but not for commercial use” crowd? They “open-sourced” nothing.

Can 31B Really Match 620B?

Back to that headline: how does a 31B model match one 20x larger?

My understanding: Google’s probably invested more in “model efficiency” than “model scale” recently.

Simply put: making models “learn smarter,” not just “learn more.” Through architecture optimization, training data curation, distillation techniques—letting small models master what previously only large models could.

Like studying a subject. One person memorizes everything (stacking parameters). Another understands principles and applies them (improving efficiency). The latter might “learn less” but isn’t necessarily worse.

Google hasn’t published the details, so I’m speculating. But at least the results show this path works.

A New Player in Open Source

Google’s push into open source sends a clear signal: they don’t want Meta (Llama series) dominating open-source discourse.

Meta’s been practically unchallenged in open source for two years. Google had open-source projects but zero presence. Gemma 4 is their “no-joke” entry.

For developers, this is great news. Competition brings choices. Choices drive progress.

Can’t predict how the open-source ecosystem evolves, but Google’s entry definitely makes this race more interesting.

Don’t Get Fooled by Parameter Counts

Real talk: parameter counts used to mean “bigger is better.” Now it’s becoming “smarter is better.”

That 31B matching 620B result? It’s reminding us: stop chasing parameters blindly. Look at actual performance.

Like buying cars. Bigger engine doesn’t mean better—you need the right powertrain, tuning, and matching for your needs. Driving a 6.0L SUV in city traffic is worse than a 1.5T sedan.

Same with models. What fits your use case matters most.