> Given the cited stats here and elsewhere as well as in everyday experience, does anyone else feel that this model isn’t significantly different, at least to justify the full version increment?
My experience is the opposite - I'm using it in Cursor and IMO it's performing better than Gemini 2.5 Pro at being able to write code which will run first time (which it wasn't before) and seems to be able to complete much larger tasks. It is even running test cases itself without being prompted, which is novel!
I'm a developer, and I've been trying to use AI to vibe code apps for two years. This is the first time I'm able to vibe code an app without major manual interventions at every step. Not saying it's perfect, or that I'd necessarily trust it without human review, but I did vibe code an entire production-ready iOS/Android/web app that accepts payments in less than 24 hours and barely had to manually intervene at all, besides telling it what I wanted to do next.
My experience is the opposite - I'm using it in Cursor and IMO it's performing better than Gemini 2.5 Pro at being able to write code which will run first time (which it wasn't before) and seems to be able to complete much larger tasks. It is even running test cases itself without being prompted, which is novel!