Just from basic logic this has to be false. Maybe there are some translucent t-shirts that are SPF 7 but my skin always reacts much more to sun exposed parts that have SPF applied than it ever did under t-shirt. And no i use high quality SPF50 and reapply.
The matrix required for a fair comparison is getting too complicated, since you have to compare chat/thinking/pro against an array of Anthropic and Google models.
But they publish all the same numbers, so you can make the full comparison yourself, if you want to.
Not the OP, but I think "slight" here is in relation to Anthropic and Google. Claude Opus 4.5 comes at $25/MT (million tokens), Sonnet 4.5 at $22.5/MT, and Gemini 3 at $18/MT. GPT 5.2 at $14/MT is still the cheapest.
I used the pricing for long context (>200k) in all cases. I personally use AI as coding assistants, like lots of other people, and as such, hitting and exceeding 200k is quite the norm. The numbers you are showing are for <200k context length.
I also use them as coding assistants among other things, like lots of other people, and hitting and exceeding 200k is absolutely not the norm unless you're using a large number of huge MCP servers. At those context sizes output quality significantly declines, even with the claims of "we support long context". This is why all those coding assistants use auto-compression, not just to save money, but largely to maintain quality. In any case, >200k input calls are a small fraction of all.
Ironically at that input size, input costs dominate rather than output, so if that's the use case you're going for you want to be including those in your named prices anyway.
In particular, the API pricing for GPT-5.2 Pro has me wondering what on earth the possible market for that model is beyond getting to claim a couple of percent higher benchmark performance in press releases.
>Input:
>$21.00 / 1M tokens
>Output:
>$168.00 / 1M tokens
That's the most "don't use this" pricing I've seen on a model.
Last year o3 high did 88% on ARC-AGI 1 at more than $4,000/task. This model at its X high configuration scores 90.5% at just $11,64 per task.
General intelligence has ridiculously gotten less expensive. I don't know if it's because of compute and energy abundance,or attention mechanisms improving in efficiency or both but we have to acknowledge the bigger picture and relative prices.
Sure, but the reason I'm confused by the pricing is that the pricing doesn't exist in a vacuum.
Pro barely performs better than Thinking in OpenAI's published numbers, but comes at ~10x the price with an explicit disclaimer that it's slow on the order of minutes.
If the published performance numbers are accurate, it seems like it'd be incredibly difficult to justify the premium.
At least on the surface level, it looks like it exists mostly to juice benchmark claims.
It could be using the same early trick of Grok (at least in the earlier versions) that they boot 10 agents who work on the problem in parallel and then get a consensus on the answer. This would explain the price and the latency.
Essentially a newbie trick that works really well but not efficient, but still looking like it's amazing breakthrough.
(if someone knows the actual implementation I'm curious)
Those prices seem geared toward people who are completely price insensitive, who just want "the best" at any cost. If the margins on that premium model are as high as they should be, it's a smart business move to give them what they want.
Pro solves many problems for me on first try that the other 5.1 models are unable to after many iterations. I don't pay API pricing but if I could afford it I would in some cases for the much higher context window it affords when a problem calls for it. I'd rather spend some tens of dollars to solve a problem than grind at it for hours.
Node has at least bun, and probably other tools, that attempt to speed things up in similar ways. New tooling is always coming for our languages of choice, even if we aren't paying attention.
Ty is still under very active development, so it either works or very much doesn't. I run it occasionally to see if it works on my codebases, and while it is getting closer, it isn't quite there yet.
reply