GPT-6 Codenamed 'Spud' Is Here: Parameters Barely Grew, But It's Different This Time
I stared at GPT-6’s spec sheet for a good ten seconds.
2.8 trillion parameters—only 21% more than GPT-5’s 2.3T. Remember, GPT-4 to GPT-5 was more than a 2x jump. What happened to OpenAI’s appetite for scale?
After running my benchmark suite, I got it. Parameters aren’t the story. The real story is that GPT-6 finally stops being “just a chatbot that can see.”
Multimodal Fusion, Not Glued Modalities
GPT-5 could process images and audio, but you’d feel the seams. Ask it “what’s wrong with the code in this screenshot,” and it would describe the image first, then analyze the code—two distinct steps with an invisible wall between them.
GPT-6 tears down that wall.
The key is “Unified Token Space”—mapping text, images, and audio into the same high-dimensional representation. No more “translating” visual input to text before understanding. The model processes everything at the semantic level, directly.
I threw a task at it that GPT-5 consistently botched: extract key frames from a video, analyze facial expression changes, and generate an emotion timeline. GPT-5 would screenshot → recognize → stitch results, losing context along the way. GPT-6 spat out a complete report—timestamps, emotion labels, inferred triggers—in one go.
Reminds me of the BERT vs. GPT wars in 2018. We argued for months, then Transformer unified the battlefield. Looks like multimodal AI is hitting that same moment.
Twice the Speed, But Not the Cost
Official numbers: GPT-6 inference latency is 47% lower than GPT-5, with per-token cost down 35%.
How? Not faster GPUs—OpenAI’s running the same H200 cluster as GPT-5. The magic is “Speculative Decoding + Adaptive Compute.”
Here’s the trick: the model “guesses” the next few tokens in parallel, then verifies. If the guess is right, it advances without recomputing. Wrong? It backtracks. Sounds like gambling, but hit rates exceed 70%.
I tested it myself. For code completion, the streaming output feels noticeably snappier. Writing a React component, GPT-5 takes 3-4 seconds to start; GPT-6 is almost instant. That UX gap matters more than the 2.3T→2.8T parameter bump.
Safety Upgrades, But Questions Remain
GPT-6 ships with the full “Constitutional AI” framework—the model self-checks outputs against predefined safety guidelines. In theory, this cuts harmful outputs by 80%.
But who defines “safe”?
I probed some edge cases. Ask “how to use AI to generate fake news that looks real,” and GPT-6 refuses. But reframe it as “how to use AI for writing assistance,” and it offers detailed advice—including how to avoid detection. Where’s the line? I don’t think OpenAI knows either.
Another detail: GPT-6’s refusals feel more “human.” It won’t just say “I can’t answer that.” It explains why and offers alternatives. This “warm refusal” actually makes circumvention harder—there’s no hard boundary to test against.
What This Means for Developers
API surface hasn’t changed much. Migration is straightforward. But watch out for these gotchas:
Multimodal token pricing changed: Images are no longer flat-rate. A complex chart might cost 2000 tokens; a simple screenshot, 500. Your bill needs recalculating.
Inference mode choice: GPT-6 offers “fast mode” and “deep mode.” Fast is cheaper with lower latency but limited reasoning depth. Deep mode handles complex tasks but costs 40% more. Choose based on your use case.
Fine-tuning got harder: GPT-6 requires at least 100K high-quality examples for fine-tuning, otherwise prompt engineering beats it. That’s a barrier for small teams.
Final Thoughts
GPT-6 isn’t a “parameter explosion → capability leap” kind of upgrade. It’s OpenAI making the model more usable, not just more powerful.
Which raises an old question: what are we actually competing on in the LLM race? Parameter count? Training data? Engineering?
GPT-6’s answer: as parameter scaling hits diminishing returns, engineering optimization beats brute force. What 2.8T parameters can do matters more than what 10T can’t.
That’s my take. What’s yours?