Claude Opus 4.7 Released at Midnight: I Ran Some Tests, This Upgrade Is Substantial
Late on April 16th, Anthropic quietly released Claude Opus 4.7.
I say quietly, but it made quite a splash—the Opus series is Claude’s flagship, and the timing was exactly three days after GPT-5.4’s release. Hard to dismiss as coincidence.
I ran several tests overnight. Here’s what I found.
First, the most noticeable improvement: it writes code much more reliably.
I had it refactor a core module—about 3,000 lines. With Claude 4.6, I often hit frustrating issues—code that looked runnable but handled edge cases wrong, or referenced APIs from two years ago.
Version 4.7 shows clear improvement. Not perfect, but error frequency and severity both decreased. More importantly, when you point out mistakes, it understands faster and provides better fixes.
How to describe this? Like evolving from “a smart but sloppy intern” to “an experienced mid-level engineer.”
Official numbers support this. On SWE-bench Verified—a benchmark testing models’ ability to solve real GitHub issues—Opus 4.7 scored significantly higher than 4.6. This test uses real problems, making it more meaningful.
But let me throw cold water on the hype.
Opus 4.7’s pricing remains unchanged—$5 per million input tokens, $15 per million output. In today’s market where everyone’s slashing prices, that’s definitely not cheap.
And for most everyday tasks, this performance boost might not be very noticeable. Ask it to write a simple script or edit a config file, and the difference between 4.6 and 4.7 might be like 90 vs 95. Only when tackling complex engineering problems does that 5-point gap become significant.
I also noticed something interesting: Opus 4.7 seems more cautious.
Previously, when asked questions, it gave direct answers even when uncertain. Now it acknowledges uncertainty first, then presents several possible solutions. From a reliability standpoint this is good, but some might find it hesitant.
This is AI’s eternal dilemma: balancing confidence and accuracy. Too confident leads to errors; too cautious feels slow. Opus 4.7 clearly leaned toward caution.
Another point: Anthropic simultaneously released Claude Mythos Preview, specifically for cybersecurity testing. Apparently only partners like NVIDIA and JPMorgan have access. This specialized approach might be Anthropic’s focus going forward.
Overall, Opus 4.7 is a solid upgrade, not just a marketing gimmick. But if you’re not a heavy programming user, or budget-constrained, stick with 4.6 and wait for prices to drop.
After all, models keep getting better, but my wallet doesn’t magically thicken.