GPT-6 Officially Launched: Codename 'Spud', 2M Context + 40% Performance Boost

After 18 months of waiting, OpenAI finally served the ‘potato’.

On April 14th, GPT-6 went live globally. Codename: Spud. Potato. My first reaction? That’s a pretty down-to-earth name for a flagship model. But then I thought about it—potatoes are the world’s fourth-largest food crop. Cheap, abundant, filling. If this thing can spread like potatoes, that’s actually saying something.

So what makes this ‘potato’ different?

First, the big one: context window jumped from 200K to 2M tokens.

What’s 2M tokens? Roughly enough to fit the entire Dream of the Red Chamber. Before, if you threw a long paper at the model, it’d forget the beginning by the time it reached the middle. Now? It can swallow a whole book and spit it back out. For code analysis, long-form writing, legal contract review—this is a qualitative leap.

I tested it myself, throwing a 5,000-line TypeScript project at it to analyze architecture issues and potential bugs. GPT-5.4 could only digest half before losing context. Now it can fully analyze dependencies across all files. Honestly, that ‘I remember what we talked about’ feeling makes it seem less like chatting with a forgetful AI.

Performance-wise, the official claim is 40% improvement over GPT-5.4.

I take that with a grain of salt. Ran some benchmarks—math reasoning did improve significantly, but code generation accuracy gains weren’t as dramatic. Some basic mistakes (like treating async functions as sync calls) still slip through. Overall flow is definitely smoother though, especially conversation continuity across multiple turns.

One interesting detail: GPT-6 uses a brand-new ‘Symphony Architecture’. Officially, it’s like an orchestra where multiple expert modules work together. Sounds fancy, but it’s essentially an upgraded MoE (Mixture of Experts) architecture—just with a smarter ‘conductor’ knowing when to call which expert.

AGI’s final mile? Altman overshot that one.

At the launch, Sam Altman straight-up called it ‘the final mile to AGI.’ Personally? That’s marketing overdrive.

Yes, GPT-6 shows significant progress in long context, multimodal understanding, and tool use. But old problems persist: hallucinations, broken reasoning chains, insufficient generalization to complex real-world scenarios. Ask it to write a simple automation script? No problem. But have it independently complete an entire product development pipeline? Still a long way off.

One test stuck with me: I asked it to plan a two-week Japan trip—flights, hotels, attractions, transport. It produced a ‘theoretically perfect’ itinerary. Except some attractions don’t exist, and hotel names don’t match. What does this tell us? It still goes down that ‘looks reasonable but actually wrong’ path.

Pricing stayed the same—pretty decent move.

API pricing matches GPT-5.4: $5/M input tokens, $15/M output. Given the performance boost and context expansion, that’s fair pricing. But here’s the catch: 2M context means each call consumes way more tokens, so your bill might be thicker than expected.

My take: If you mainly do short-text work (customer service, translation, simple Q&A), no rush to upgrade. GPT-5.4 is plenty. But if you need long-document analysis, code review, complex reasoning—GPT-6’s 2M context is worth exploring.

Bottom line: This ‘potato’ is tastier than before, but still a ways from ‘filling everyone up.’ AGI’s final mile? I’d say there are still a few marathons to run.