I Tested GPT-6 Against GPT-5 with the Same Prompt — The Results Surprised Me

I stayed up late the night GPT-6 launched to watch the keynote.

Sam Altman stood on stage holding a potato — yes, the Spud codename was literal — talking about Symphony architecture, multimodal fusion, and ‘the last mile to AGI.’

The keynote was impressive, but my engineer instincts kicked in: don’t get excited, run a benchmark first.

So last week, I tested GPT-5 and GPT-6 with the exact same prompt.

My methodology

The task was a moderately complex code refactoring job: take a messy Python script, refactor into modular structure, optimize performance, add type annotations, write unit tests.

Why is this hard? It tests understanding of overall code structure, reasonableness of module decomposition, judgement in optimization, and completeness of test coverage.

The results are in

Response time: GPT-6 was noticeably faster — 47 seconds vs 2+ minutes for GPT-5.

Code quality: this is where it gets interesting.

GPT-5’s approach was more conservative — it would ask ‘do you want to review the current structure first?’ and proceed step by step.

GPT-6’s approach was more aggressive — assumed it understood my needs, delivered complete refactoring in one shot. The code was more ‘elegant,’ using more modern Python idioms.

But here’s the problem: GPT-6’s solution contained a hidden assumption I hadn’t stated. GPT-5 would have asked; GPT-6 guessed.

I implemented GPT-5’s solution in approximately 1 hour. GPT-6’s required 3 rounds of fixes, taking approximately 1.5 hours.

My conclusion

GPT-6 is genuinely better, but I’m skeptical of the ‘40% improvement’ claim.

For simple tasks, GPT-6’s speed advantage is obvious. For complex tasks, GPT-6’s ‘confidence’ is a double-edged sword — when it guesses right, faster; when wrong, fixing takes more time.

The Symphony architecture sounds impressive, but there’s room for improvement in real engineering.

My take: don’t trust the keynote, run your own benchmarks.

Anyone else using GPT-6? What’s your experience been?