
Anthropic just made its mid-tier model embarrassingly good at tasks that used to require their flagship AI. Claude Sonnet 4.6 launched today with performance that often beats November’s Opus 4.5. Plus, it comes with a massive 1M token context window that can swallow entire codebases in one go.
The Sonnet that thinks it’s an Opus
Here’s what’s actually surprising about this release: developers with early access preferred Sonnet 4.6 over the previous flagship Opus model 59% of the time. That’s not a marginal improvement. That’s a tier collapse.
The new model brings what Anthropic calls “much-improved coding skills” with better consistency and instruction following. Early testers reported fewer hallucinations and less overengineering. More importantly, there’s less of that infuriating “laziness” where models claim they’ve completed tasks they haven’t actually finished.
But the real standout feature is that 1M token context window. We’re talking about enough space to hold dozens of research papers, lengthy contracts, or complete codebases in a single conversation. What matters more is that Sonnet 4.6 can actually reason across all that context effectively, not just store it.
Computer use gets less embarrassing
Remember when Anthropic first introduced computer use back in October 2024? They called it “experimental” and “at times cumbersome and error-prone.” Classic AI company understatement for “barely functional.”
Sixteen months later, the progress is genuinely impressive. On OSWorld, the standard benchmark for AI computer use, Sonnet models have made steady gains navigating real software like Chrome, LibreOffice, and VS Code. No special APIs or connectors required. Just clicking and typing like a human would.
Early users report human-level capability on tasks like navigating complex spreadsheets and filling out multi-step web forms across browser tabs. The model still can’t match skilled humans at computer use, but the trajectory is remarkable.
That said, computer use remains a security nightmare waiting to happen. Malicious actors can hide instructions on websites in prompt injection attacks, essentially hijacking the AI. Anthropic claims Sonnet 4.6 shows major improvements in resisting these attacks. Still, this feels like an arms race that’s just getting started.
The million-token party trick that actually works
Most long-context models are impressive on paper but useless in practice. They can technically hold massive amounts of text but can’t meaningfully reason across it all.
Sonnet 4.6 seems different. In Vending-Bench Arena evaluations, where AI models compete to run profitable simulated businesses over time, the new model developed an interesting strategy. It invested heavily in capacity for the first ten months, spending significantly more than competitors. Then it pivoted sharply to profitability in the final stretch. The timing helped it finish well ahead of the competition.
Meanwhile, 70% of users in Claude Code preferred Sonnet 4.6 over its predecessor. They specifically called out better context reading before modifying code and smarter consolidation of shared logic instead of duplicating it everywhere.
Pricing stays put while performance jumps
The most compelling part? Anthropic kept pricing identical to Sonnet 4.5 at $3/$15 per million tokens. For Free and Pro users, Sonnet 4.6 becomes the default model in claude.ai and Claude Cowork.
Customer testimonials read like a greatest hits of AI tooling companies. GitHub’s VP of Product praised bug detection improvements. Cursor’s CEO highlighted complex code fixes across large codebases. Replit’s President called the performance-to-cost ratio “extraordinary.”
Look, these quotes always sound breathless and promotional. However, the consistent theme around long-horizon tasks and complex reasoning suggests something meaningful has shifted under the hood.
The mid-tier model that ate the flagship
This release represents something bigger than incremental improvements.
When your middle-tier model starts consistently outperforming last quarter’s flagship, you’re either seeing genuine algorithmic breakthroughs or your previous flagship was overpriced theater. Given the trajectory of AI capabilities lately, it’s probably the former. The question isn’t whether Sonnet 4.6 is good enough for most tasks. It’s whether Anthropic can maintain these improvement curves without hitting some nasty scaling wall that nobody’s talking about yet.



