When discussing LLM pricing, people are missing the plot. The subscription token...

simonw · 2026-05-26T15:09:54 1779808194

I learned today that the Anthropic "Enterprise" plan - the one big companies use because they need governance features and audit logs and all of that jazz - is billed at API token rates (plus $20/seat/month).

So large companies are getting billed a lot more than those discount subscription plans.

alexriddle · 2026-05-26T15:22:13 1779808933

Anything over 150 seats means you need to pay at token rates plus the $20/user. My day job is operational (no coding at all) and I'm spending ~$300 a month on a few chats with Claude/Cowork a day over the course of a month.

m_kos · 2026-05-26T18:17:25 1779819445

$300 is my employer's monthly cap on Claude Enterprise. It lasts me at most a week of moderate use. I would much rather get Codex Pro and Claude Pro or Max, which would cost ≤ $200. For $300, one could also add Gemini Ultra to the mix so I could have all three review each other's code, etc.

Claude can be very good but enterprise pricing doesn't make sense to me.

lunar_mycroft · 2026-05-26T18:39:39 1779820779

The $200 plan you're talking about is subsidized by Anthropic. They cannot afford to keep offering that to everyone indefinitely. Absolute best case scenario for current users is that they can continue to subsidize it as way to sell enterprise plans, but there's no way that they can keep offering it to everyone at those prices.

wahnfrieden · 2026-05-26T20:51:13 1779828673

They can if it is a way to get individuals hooked on it to then introduce it at their workplaces, who pay enterprise rates.

lunar_mycroft · 2026-05-26T23:59:31 1779839971

Right, they can do it to sell enterprise plans, but they can't offer said plans to those enterprise customers indefinitely. So if your employer wants to spend $200/month on tokens, you're going to get however many tokens $200 buys you each month, not the order of magnitude more you can get with a consumer subscription.

wahnfrieden · 2026-05-27T01:36:34 1779845794

That’s what I’m saying. Enterprise customers don’t use the subscription plan

addedGone · 2026-05-27T12:35:03 1779885303

Except that they do, we do.

A lot of startups pile up enormous amount of accounts, companies don't need the Enterprise Anthropic solution, they can just subscribe to many accounts and have their own staff KYC for each (1 codex, 1 claude, 1 google and so-on).

lunar_mycroft · 2026-05-27T16:29:40 1779899380

That will be clamped down on by Anthropic (and other providers) for the same reason they don't offer those plans to enterprise customers already.

Bnjoroge · 2026-05-27T14:53:00 1779893580

I imagine it’s also really trivial to build some kind of local “enterprise” proxy that gives you the same visibility in usage as the anthropic dashboard would give you. I use one for aggregating all my subs.

ilikehurdles · 2026-05-27T17:01:03 1779901263

We definitely pay enterprise api costs. Only way to get google vertex integration, and Enterprise is too sensitive to let all of their data leave their moat.

wahnfrieden · 2026-05-27T16:53:15 1779900795

Startups do but do you know large enterprises that do?

goosejuice · 2026-05-26T22:45:02 1779835502

> They cannot afford to keep offering that to everyone indefinitely.

Common talking point. There's enough evidence for the counter argument that this is essentially misinformation. I have no idea why it's so often repeated with confidence.

cdata · 2026-05-26T22:55:55 1779836155

> There's enough evidence for the counter argument that this is essentially misinformation.

> No evidence is shared

Help an open-minded critic out.

goosejuice · 2026-05-26T23:30:29 1779838229

Brand new industry, massive capital, dropping inference costs, increasing availability of compute, cost centers / subsidized subscriptions are common in SaaS, heavy competition, no public information on actual utilization rates.

How much is Waymo burning a year? 3B on 300M ARR? Anthropic is what 5B on 20B ARR? Waymo is 3x older. Why don't we hear such confident statements about how subsidized their rides are?

It's one thing to speculate it's another to parade it as fact. Even if the S1 reveals an unprofitable business today, you can still only claim it's unlikely.

lmm · 2026-05-27T00:03:27 1779840207

> How much is Waymo burning a year? 3B on 300M ARR? Anthropic is what 5B on 20B ARR? Waymo is 3x older. Why don't we hear such confident statements about how subsidized their rides are?

We do. We hear it less often because no-one is talking about how Waymo changes how we all need to work or whatever, that's all.

lunar_mycroft · 2026-05-27T00:12:40 1779840760

Do people commonly argue Waymo isn't subsidizing rates?

Also, we do have some evidence for my position:

- We know that the consumer Claude plans provide _way_ more tokens than you could get if you were paying API prices. This is a huge part of why Anthropic's limits on other harnesses for subscription customers is such a big deal. So either their profit margin on API tokens is absurdly high, most consumer subscribers don't come anywhere near their rate limits, or they're losing money on the consumer subscriptions. - It appears that complains about people running into rate limits are common, which suggests the "consumers usually don't use much of their subscription" explanation is incorrect. - We also know that Anthropic has just become profitable, almost certainly driven mostly by enterprise customers. This rules out the "they make a very high profit margin on the API" explanation, since if that was the case they'd likely have been profitable much earlier.

Taken together, I think the case that their consumer subscriptions lose them money on net is pretty strong, even though their enterprise subscriptions (and API pricing) does make them a profit.

goosejuice · 2026-05-27T02:10:33 1779847833

> I think the case that their consumer subscriptions lose them money on net is pretty strong, even though their enterprise subscriptions (and API pricing) does make them a profit.

To be clear I'm not arguing against this position, just questioning the confidence with which people claim that the current consumer subs are not a sustainable offering and a merely temporary.

fluidcruft · 2026-05-27T11:19:56 1779880796

Burning money is never sustainable. All you're actually saying is nobody can predict how long this particular bonfire will burn.

goosejuice · 2026-05-27T14:32:09 1779892329

Again this is nonsense for the reasons I've already given. The costs aren't fixed.

ilikehurdles · 2026-05-26T22:06:29 1779833189

That’s a shocking number. I don’t know how much my employer is billed, but based on the numbers reported by Claude code in its optional status bar, I’m often exceeding $300 in a day across sessions, when working on meatier tickets.

stymaar · 2026-05-26T15:33:26 1779809606

I hope your company is keeping the input/response pair in case they need to break free at some point.

dd8601fn · 2026-05-26T17:28:36 1779816516

Wouldn’t people mostly just want any artifacts?

speed_spread · 2026-05-27T02:31:15 1779849075

Like Slack history, LLM history can be used to build searchable knowledge base. Questions are often more valuable than answers.

stavros · 2026-05-26T22:06:33 1779833193

We deployed OpenWebUI with the Claude API the other day for employees. Someone sent ten messages (which appeared to just be reasonable day-to-day work), and we paid $200 for it. There were 44M input tokens, 100k output tokens, no cache hits at all. OpenWebUI reports 3M tokens used, Claude reports 44M, and I have no idea where the rest of the tokens went. This was all on a brand new API key, installed directly to the service, too.

With this kind of opaque billing, how can I reasonably deploy any AI?

SyneRyder · 2026-05-27T10:55:41 1779879341

No cache hits seems ominous, could this be an OpenWebUI issue? It also seems ominous that Anthropic models are basically nowhere on the OpenWebUI leaderboards.

I'm only doing a cursory search, but it seems OpenWebUI doesn't support Anthropic caching, and they don't intend to? Other providers handle caching automatically (apparently?) but caching has to be specifically managed by the client with Anthropic. If that's correct that OpenWebUI doesn't support it, it would really send your costs spiralling, because you're being billed for all the tokens in the entire multi-turn conversation on every turn:

https://github.com/open-webui/open-webui/issues/4887

I have no experience with OpenWebUI though (honestly, first time I've heard of it). Just trying to be helpful. If I'm completely incorrect then apologies in advance for sending you down the wrong path.

stavros · 2026-05-27T13:48:02 1779889682

Really? Huh, I've never heard of Anthropic caching needing to be specifically enabled. I'll look into that, thank you! Sounds like the culprit.

jgreid · 2026-05-26T16:15:46 1779812146

Governance and audit trail are incredibly valuable to large enterprise organizations, especially those working in regulated spaces. Companies will pay a premium if the security/privacy/compliance issues are handled effectively.

zaphirplane · 2026-05-27T01:49:18 1779846558

What is the governance and audit trail on offer ?

chinathrow · 2026-05-27T20:05:23 1779912323

Do we know that they all pay the API rates or will they negotiate individually?

zackify · 2026-05-26T17:13:23 1779815603

We are on it at my job. It saves money due to other parts of the org not using as many tokens.

The real cost effective way is giving a team $20 cursor $20-100 Claude $20-200 codex.

I'm spending 1k on Claude enterprise easily and that's with trying to spread it on codex and cursor using pi.

htrp · 2026-05-26T20:14:33 1779826473

> I learned today that the Anthropic "Enterprise" plan - the one big companies use because they need governance features and audit logs and all of that jazz - is billed at API token rates (plus $20/seat/month).

Can large enterprises just not use the API ? I have audit logs and what seem to be enterprise features through my anthropic account (platform.claude.ai)

simonw · 2026-05-26T21:20:02 1779830402

They can do that, but I expect they see individual user accounts and enterprise account management and easy rollout of Claude.ai/Cowork/Claude Code/etc as worth an extra $20/month/person.

thewebguyd · 2026-05-26T20:43:22 1779828202

The devs can, sure. The "enterprise plan" is more for that + giving Claude to all the non-technical employees for access to the chatbot + Cowork. Plus SSO and all that jazz.

acdha · 2026-05-27T02:10:28 1779847828

Enterprises can, but then they have to show their auditors that this has been done in a way which is robust and can’t be bypassed, and they have to build the kind of reports people need to be convinced of that — nothing is ever “just” in enterprise IT.

Longer term, you also have to be careful about building things around details which could change at any time. OpenAI and Anthropic have a ton of pressure to start banking huge profits and they very closely monitor customer activity. A time-honored strategy in this space is to shuffle the features enterprise customers depend on but which aren’t deal-breakers for most other customers into expensive enterprise plans. There’s possibly some counter pressure from companies like Google which have healthier finances but I wouldn’t count on that since they also have MBAs who’d be all too happy to invent pretexts to hike their prices to match.

opsnooperfax · 2026-05-27T00:02:41 1779840161

Your CISO is paying to not be responsible. That’s it. That’s always the reason.

pyreko · 2026-05-26T18:30:30 1779820230

Yep, where I work I know people easily spending over a few thousand dollars a month.

datadrivenangel · 2026-05-26T15:36:42 1779809802

I've heard that the $20/seat gets waved if you have large enough committed spend.

isoprophlex · 2026-05-26T16:15:53 1779812153

Would they even care at that scale, if the average employee spends $3000 every month because mgmt mandates slopmaxxing?

stymaar · 2026-05-26T15:31:46 1779809506

> Lastly, there is a massive difference in capabilities, determinism, and error handling between 5T SOTA models like Opus

What's your source for Opus being a 5T model?

> and tiny distillations from DeepSeek that perform well only in benchmarks.

I don't think you know what you're talking about. Local models aren't “distillations from Deepseek”.

And they don't perform well “only in benchmarks”, Qwen 3.6 is a very decent model (obviously it's not Opus, but it's also much faster and speed is a quality of its own).

gpugreg · 2026-05-26T15:41:49 1779810109

> What's your source for Opus being a 5T model?

Elon Musk tweeted that Grok is 0.5T or 1/10th the size of Opus. https://xcancel.com/elonmusk/status/2042123561666855235#m

While this source's reliability is certainly debatable, the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/

stymaar · 2026-05-26T16:02:12 1779811332

> While this source's reliability is certainly debatable

Massive understatement. Nowadays it has become hard to find a single Musk statement that doesn't contain at least one lie.

> the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/

Thanks for the pointer. This estimation has Grok 6 times bigger than Musk claims it is, so maybe that's where the lie is.

(I'm quite skeptical about that number though, it would be quite disappointing for the US tech if their flagship models had to be that much larger than the Chinese ones for such a small edge in performance. Because I don't think US labs are incompetent, I'd bet that US flagships aren't more than 2/3 times bigger than Chinese flagship. Otherwise it really doesn't bode well.)

striking · 2026-05-26T16:21:32 1779812492

In tiny gray text right above the table is written "90% PI ≈ ±3.00× either side." Is GPT-5.5-Pro 3.4T or 30.8T in size, or somewhere in between? We just don't know.

fluidcruft · 2026-05-27T11:27:23 1779881243

Musk has a lot of incentive to explain away how horrible Grok is relative to Opus.

It's certainly a better sell that Grok sucks because it's small and Opus is impressive because it's large, than the alternative that Grok is also large and sucks which points to xAI incompetence and mismanagement.

Particularly when you're trying to IPO a rocket company based on rosy forecasted valuations of Grok dominating the market.

orphea · 2026-05-27T09:38:33 1779874713

  Elon Musk tweeted

Come on. The Onion would be a more credible source.

UltraSane · 2026-05-27T04:53:18 1779857598

Elon Musk has absolutely no credibility anymore. I'm more likely to believe the opposite of what he claims to be true.

kakacik · 2026-05-27T11:49:37 1779882577

aka the russian strategy

Chyzwar · 2026-05-26T18:56:17 1779821777

https://arxiv.org/abs/2604.24827

From this paper

stymaar · 2026-05-26T19:25:59 1779823559

That's not what the paper says though:

    Claude Opus 4.6 Anthropic 68.0% ∼5.3T [1.8–15.6T]
    Claude Opus 4.7 Anthropic 66.4% ∼4.0T [1.4–12.0T]
    Claude Opus 4.5 Anthropic 65.2% ∼3.4T [1.1–10.0T]
    Claude Opus 4.1 Anthropic 64.9% ∼3.2T [1.1–9.5T]
    Claude Opus 4 Anthropic 59.7% ∼1.4T [478B–4.2T

According to their estimation, Opus is likely between 1T and 15T, which really doesn't tell you much that you couldn't have guessed otherwise. It doesn't say “Opus is a 5T model”.

The fact that there's absolutely no consistency in the predicted size between models from the same lab should tell you all you need about the predictive power of this method (and they aren't really lying about their numbers, their confidence interval is huge enough to fit anything in it, but their prose is making very strong claims out of their statistical nothingburger).

(somebody already posted this paper earlier, and I spent some time reading it, and this paper is really not that good even though there are a bunch of interesting ideas in it).

layer8 · 2026-05-26T15:47:22 1779810442

> What's your source for Opus being a 5T model?

Probably Elon Musk: https://eu.36kr.com/en/p/3760679047267075

UltraSane · 2026-05-27T04:54:46 1779857686

I don't know why stymaar's comment is flagged and dead, he is 100% correct.

stymaar · 2026-05-26T15:50:16 1779810616

[flagged]

ramesh31 · 2026-05-26T18:03:05 1779818585

People can simultaeneously be reprehensible idiots while being a reliable expert on something they have personally invested billions of dollars into and operate at scale.

overfeed · 2026-05-26T18:31:26 1779820286

> ...while being a reliable expert on something they have personally invested billions of dollars into and operate at scale.

Like "Full Self-Driving" from coast-to-coast by 2016?

awkwardpotato · 2026-05-26T18:16:01 1779819361

He's also invested billions of dollars in SpaceX and Tesla... which he regularly makes wild claims about that are untrue.

amanaplanacanal · 2026-05-26T18:55:25 1779821725

I'm not saying he actually is an expert, but he could be an expert and still lie for any number of reasons.

stymaar · 2026-05-26T18:56:10 1779821770

Elon is a specialist of lying about stuff he invested billions in to make it look more valuable than it is (he's been doing that for Tesla for years). It's not a lack of expertise, it's the lack of any sense of integrity (and self respect).

He's lagging the AI race despite having tons of compute available, so he tries to make a narrative about how it's not that the model is behind, it's just smaller than the competition.

xbmcuser · 2026-05-26T16:14:35 1779812075

Its not like the non frontier are not improving. If someone can use deepseek to get 90% of the work done for $100 then pay another $100 to anthropic or openai to complete it I think they will rather do that than pay anthropic or openai for $1000.

jonfromsf · 2026-05-27T15:19:21 1779895161

Yes, for indie developers and small startups. Large corps won't want their code /email/etc data being looked at by the Chinese government.

LUmBULtERA · 2026-05-27T15:55:47 1779897347

For Deepseek and other openweight models, you can use non-Chinese hosted infrastructure that offer zero data retention and still save a whole lot of money. A large corp could even host their own Deepseek v4.0 Flash model internally for some basic work.

runtime_terror · 2026-05-26T15:53:07 1779810787

> The subscription token price is 10x-40x cheaper than API pricing

This is a temporary phenomenon. Expect either drastic price increases or draconian throttling or both in the coming months.

These companies are operating at huge loses and have hundreds of billions in liabilities and commitments. They need to turn on the money faucet sooner than later.

Npovview · 2026-05-26T16:08:08 1779811688

Even with increased prices, AI enables velocity both in development and bugs fixing. Would companies want that? If prices are biting the company, I think companies will route all development and bugs fixing requests through few superperfomer developers with complete knowledge of the different components within the company (they will be the Queen Bees holding the company on their head). The rest of the company will be tasked with requirment gathering, specs cleaning, deambiguation and so on (worker bees).

runtime_terror · 2026-05-26T19:55:39 1779825339

So kinda like how stuff is now at a lot of big companies? I've worked at many different companies and almost always there are a few out-performers and a lot of people just found enough not to get fired (no hate, power to them lol).

We're already seeing slash their AI budgets. I expect that will increase till we hit more of an equilibrium.

Npovview · 2026-05-27T14:05:10 1779890710

I think people will start measuring (features * time taken to implement) to tokens consumed ratio and then redistribute token budgets to developers. This will measure how effective/efficient people are LLMs.

0xbadcafebee · 2026-05-27T01:44:55 1779846295

Most software development teams are pushing back on the deluge of bad changes from AI tools and are moving slower again to regain trust and stability. It is likely that future software development will not actually be higher velocity.

Npovview · 2026-05-27T02:20:15 1779848415

Bad changes will be eliminated because better people are using the AI tools. They will reduce cost as well as slop.

kakacik · 2026-05-27T12:32:07 1779885127

Yes and we will all hold hands together singing and bring world peace from now on forever. Back in reality, I can see some obstacles without even trying

Npovview · 2026-05-27T14:11:51 1779891111

Do you want 60% of people to be employed in farming? I am being rhetorical because that is what you sarcasm implies. Today only 2% of people in farming support so many people in America.

0xbadcafebee · 2026-05-27T11:41:22 1779882082

The models are the same whether you're smart or dumb

curt15 · 2026-05-27T00:23:40 1779841420

> Even with increased prices, AI enables velocity both in development and bugs fixing.

What about human understanding of the codebase that's essential to any project's long term health? Even "superperformer developers" eventually leave the company.

Npovview · 2026-05-27T13:57:12 1779890232

Ask multiple AIs (if you can't trust one) to explain the project.

DougN7 · 2026-05-26T18:34:15 1779820455

From what I understand, that is sort of how IBM Bob works - multiple models behind the scenes and they route the request to the model that will handle it best at the lowest price.

otabdeveloper4 · 2026-05-27T10:54:02 1779879242

> AI enables velocity both in development and bugs fixing

Or so they say. You'll have to trust those vibes blindly, because double-checking these claims apparently makes you an anti-science luddite.

8note · 2026-05-27T05:51:40 1779861100

the alternative is that api prices change to be more in line with deepseek's

runtime_terror · 2026-05-27T16:08:49 1779898129

So then how do these companies return profits to investors?

anthonypasq · 2026-05-26T16:02:02 1779811322

Theres recent reporting that Anthropic will be profitable this quarter...

edit: I see in other comments on this thread you think Ed Zitron is a reliable pundit so that explains everything.

runtime_terror · 2026-05-26T19:59:38 1779825578

How will it be profitable, really?

You can dismiss Ed (and me vicariously) but what's your compelling evidence to counter their extremely uphill battle towards profitability?

Either way it will be very interesting to see their S1 when they try and IPO.

If it's anything like SpaceX's then I suspect my post will age better than yours.

brookst · 2026-05-27T02:13:33 1779848013

I sincerely doubt Anthropic’s IPO will say that their AI business is only 2% of their future revenue, and they’re bundling in totally unrelated, unprofitable things they expect to account for 98%.

runtime_terror · 2026-05-27T16:10:59 1779898259

I'm not sure what you're talking about or referring to...

I haven't heard anyone claim their S1 will show that but that it will show how poorly their revenue figures look against their costs.

brookst · 2026-05-27T16:37:06 1779899826

Space'x IPO docs say that launch + Starlink will be 2% of their revenue opportunity, with enterprise AI being 98%.

alfiedotwtf · 2026-05-26T16:07:57 1779811677

Incentives matter…

If prices keep going up, watch for companies to exit frontier models and go to local llama.cpp instances for 6-month-ago SOTA, with the flex of being housed within the office - no more privacy leakage, no more price gouging.

To be honest, I’m not sure why a Y-Combinator backed company hasn’t come out yet flooding the market with highly capable OPAI (pronounced “Oh-pah” as in what Greeks shout as the drink shots), which stands for “On-Prem AI”

… yes, I just made up OPAI right now lol

overfeed · 2026-05-26T18:44:23 1779821063

> I’m not sure why a Y-Combinator backed company hasn’t come out yet flooding the market with highly capable OPAI

If we momentarily disregard the fact that YC itself owns billions of dollars worth of OpenAI shares[1], YC would plan to find demo-day investors willing to drive down the value of frontier labs. The coöpetition among VCs and the existing web of AI investments will mean no VC will be interested in investing in local AI...until after the frontier labs IPO.

1. Thanks to the self-dea^w foresight of former YC president Sam Altman

nicoburns · 2026-05-26T20:07:25 1779826045

> If prices keep going up, watch for companies to exit frontier models and go to local llama.cpp instances for 6-month-ago SOTA, with the flex of being housed within the office - no more privacy leakage, no more price gouging.

That or just hiring people to do the work! I hear rumours that this is already starting to happen in some places (perhaps those that were a little overzealous with AI-hype driven layoffs).

alfiedotwtf · 2026-05-26T22:17:02 1779833822

Fool me once… I think applying for jobs at a company that only within the last 12 months shed thousands of people “because of AI” should be seen as laughable, and employees collectively rejecting to work there should be seen as the norm

runtime_terror · 2026-05-26T19:57:36 1779825456

I do think many will move to lower cost models or self hosted over the next few years as prices balloon. And the privacy/control story is compelling.

If we're able to see some big increases in hardware capabilities that can be self-hosted, that will be an accelerant.

That said, most companies just want to pay a provider to delegate responsibility in exchange for cost and control.

lelanthran · 2026-05-26T14:45:49 1779806749

> When discussing LLM pricing, people are missing the plot. [ ... snipped ...] Your 90$ Claude subscriptions give you close to $1000 to $4000 in equivalent API token pricing.

And you think it is unreasonable to consider this unsustainable?

wongarsu · 2026-05-26T15:29:38 1779809378

Depends on what their actual costs are. Either they are losing lots of money on subscriptions, or they make absolute bank on API pricing.

Looking at the pricing of 1-2T models like Kimi or DeepSeek on the open market, I'm tempted to assume that inference costs are closer to subscription pricing than to API pricing.

Especially considering that subscriptions a) distribute load over time via rate limits, and b) will include a lot of users who get only a fraction of the possible value, whether they are on a personal account where they are on the rate limit on the weekend but barely use it during the week, or are corporate users who were issued an account they rarely use. Subscription prices are usually measured on the average case, not the most extreme value a power user can get out of it

runtime_terror · 2026-05-26T15:57:12 1779811032

> I'm tempted to assume that inference costs are closer to subscription pricing than to API pricing

So just going on vibes?

While some people don't like his content, Ed Zitron shows a lot of evidence for your assumption being very wrong.

These companies are bleeding cash at ungodly rates. It's likely their API pricing is still subsidized if you look at their overall financial picture.

Related, there's a good reason those API prices keep going up a lot every new version and it's not just because the models are better.

wongarsu · 2026-05-26T16:24:51 1779812691

Selling inference for more than inference costs is not incompatible with bleeding cash at ungodly rates. They do in fact pay ungodly amounts of cash for other things, like training, marketing, etc. Heck, you can bleed cash while being profitable (in the accounting sense)

Also, API prices going up a lot every new version is more an OpenAI thing, and even there it's a recent trend: GPT 5.0 was a big price drop compared to 4.1, and 4.1 was cheaper than 4o, which itself got a price cut at some point and is cheaper than 4. Meanwhile Anthropic's API pricing stayed stable for many versions, then got slashed to a third with the 4.2 release and have stayed at that level since.

runtime_terror · 2026-05-26T19:53:23 1779825203

But explain to me how these companies will recoup these costs outside of increasing inference pricing?

Their business model is selling inference but the training and other costs have to be accounted for somehow. Unless I'm missing something obvious, inference costs must go up drastically if these companies are going to survive beyond the subsidy stage.

wongarsu · 2026-05-26T20:21:02 1779826862

Sell more. The hope is that there is a huge addressable market that includes huge per-worker demand in almost all white collar work and lots of inference in people's private lives

If that doesn't work, then yes, then prices will have to go up

0xffff2 · 2026-05-26T21:29:06 1779830946

Both anecdotally for myself and from what I'm reading in the news, it seems just as likely that AI usage has already largely peaked.

There was a lot of hype and exploration of capabilities, but models aren't evolving fast enough to keep that going, so I'm settling down into a familiarity with what an LLM can and can't do that means I am using them less overall that I was 6 months ago when I was throwing everything under the sun at it just to see what happened.

Without either new model breakthroughs or dramatically _lower_ costs, I will be very surprised if the ultimate market doesn't end up within an order of magnitude of where it is today.

d1sxeyes · 2026-05-27T11:53:16 1779882796

> AI usage has already largely peaked.

I think this is minimally likely. While as individuals on the bleeding edge, we're perhaps using these tools less and less, and our echo chamber reinforces that, the penetration of AI into the normal corporate workplace is still very low - emails rewritten with ChatGPT, meeting notes summaries generated by default, etc. There are a million use cases for LLMs which are not yet built out. The tokenmaxxers will begin using AI less, but the penetration into the mass market will continue at a huge velocity.

runtime_terror · 2026-05-27T16:16:16 1779898576

I agree that more uses will be found and that maybe we're not at the peak. But it also seems very clear a few players have been actively working to inflate usage numbers by margins that might take a while to replace with legitimate uses

runtime_terror · 2026-05-27T16:14:37 1779898477

Exactly. Like how Meta has a "blow our money on LLMs" leaderboard. Seems like a few companies are attempting to inflate hype enough so all the investors can exit without losing their heads.

Reminds me of the crypto hype but where the hype agents are some of the largest companies in the world.

runtime_terror · 2026-05-27T16:12:59 1779898379

Yeah from my understanding they'll need to create a few trillion dollars more demand to break into profitability if we look at all the debt/obligations/contracts

HDThoreaun · 2026-05-27T05:53:45 1779861225

Obviously they need more paying users. The entire game in tech is taking advantage of (comparatively)low marginal costs to pay off capex once you corner the market

runtime_terror · 2026-05-27T16:19:21 1779898761

I do think that's at least part of the strategy. The problem is that we've never seen a single product category so hyped in history, literally trillions of dollars invested. To recoup that, some not so trivial miracles will need to happen.

HDThoreaun · 2026-05-27T17:21:34 1779902494

I think that within 5-10 years most white collar workers around the world will be paying for AI assistants. There are 1.2-1.3 billion such people to sell ai to, so getting more users doesnt really seem like a miracle to me. I do think convincing everyone to use expensive proprietary models instead of open ones hosted cheaply by third parties will be a minor miracle for the AI labs. Definitely not out of the question though.

Forgeties79 · 2026-05-26T15:40:46 1779810046

Considering not one company is in the black yet I don’t really know how we can say anyone is making bank, unless we want to count absurd levels of VC funding (now slowing down) I guess.

wongarsu · 2026-05-26T15:47:46 1779810466

I am conveniently not counting training costs (since they add no marginal costs, selling more tokens doesn't impact them), and hardware and DC costs only amortized

Of course they do have to "make bank" in some way to offset the insane training costs. But whether they go for high prices or high volume, or offer some services as a loss leader to drive profits elsewhere is somewhat orthogonal to that

anthonypasq · 2026-05-26T16:03:13 1779811393

https://www.wsj.com/tech/ai/mind-blowing-growth-is-about-to-...

dminik · 2026-05-26T18:54:49 1779821689

Does Anthropic really expect to double their income without also doubling their expenses?

wongarsu · 2026-05-27T08:28:51 1779870531

There we go back to the original question: are subscriptions profitable, API pricing wildly profitable, and they just lose all that money on fixed costs like model training; or do they actually barely make money on inference?

That's why talking about the profitability of inference without accounting for model training is interesting, because that is the deciding factor in whether more customers would help getting them in the green

dminik · 2026-05-27T13:29:54 1779888594

Without actual data I don't know. My gut feeling is that they overall lose money on subscriptions (and especially the free tier that accounts for 95% of all users). And make thin profit (~5%) on API pricing.

But it's just that. A gut feeling.

i2km · 2026-05-27T07:51:05 1779868265

This is one of the things people miss. If they double their customers, of course they double their expenses. Unlike SW, the marginal cost here is still high

dminik · 2026-05-27T13:34:16 1779888856

I mean, it's possible that with the new datacenter from SpaceX, they could onboard more users than it costs them to rent. That's fair. But I kind of doubt that.

One thing that really stinks to me is that various AI boosters have been claiming insane profit margins (40%, 50%, ...), yet apparently Anthropic stands to (possibly) make $500M profit on $11B in expenses, that's clearly nowhere near 50%. Not to mention that they're not making profit on inference now.

So where do people get this confidence to pull random numbers from?

Forgeties79 · 2026-05-26T21:44:22 1779831862

I’ve been hearing that anthropic is on the verge of profitability for probably a year straight. Until all the companies agree to stop the training arms race I just don’t see how it’s in the cards

Forgeties79 · 2026-05-26T16:18:48 1779812328

Let’s see it first. And without omitting training/infrastructure costs at that. Until then my comment is still accurate.

anthonypasq · 2026-05-26T17:09:48 1779815388

its a private company, what exactly do you expect to 'see'?

Forgeties79 · 2026-05-26T18:09:16 1779818956

Anthropic IPO's in less than 5 months and I guarantee you any company that officially is in the black will proudly shout it from the rooftops.

anthonypasq · 2026-05-26T19:20:13 1779823213

> Anthropic IPO's in less than 5 months

pure speculation. about as valuable as my linked wsj reporting i suppose. given thats the case, maybe you shouldnt claim so confidently that they are money incinerators.

Forgeties79 · 2026-05-26T19:54:51 1779825291

“pure speculation” is a bit unfair.

Back to the point: No one is profitable yet, which I think we both agree is accurate. If you are going to lean on “they will be soon” then it’s fair to say they’re going to IPO soon.

Ease off the gas. We’re just discussing a tech company.

z2 · 2026-05-26T14:59:18 1779807558

And the direction is definitely towards removing that subsidy really soon. We can see it with OpenAI's shift to API-equivalent pricing for enterprise customers last month. Anecdotally my company saw OpenAI credit usage grow 2x with stable use across the ChatGPT platform, which is pretty terrifying considering just 2% of the company uses Codex.

For context, ChatGPT business subscriptions give you a fixed pool of credits to use, after which you get billed a la carte at inflated 1.75x rates vs API, or if you don't want to pay, you get access to anything but the non-reasoning models turned off for the month.

We also tried Claude Enterprise, which was unusable as people blew through their monthly limits in a matter of hours.

onesingleblast · 2026-05-27T20:59:52 1779915592

So when you need an LLM in your backend, scrape Claude Code instead of using the API :)

stingraycharles · 2026-05-26T15:09:35 1779808175

Also, your local hardware is in no way capable of running the types of models that the cloud providers do, it’s just not economically feasible, and it never will be.

ajb · 2026-05-27T08:39:11 1779871151

SanDisk has designed a flash equivalent to HBM, which has 1.6TB/s of bandwidth. I expect that it will be available initially to server manufacturers only, but once supply ramps up will be built into individual machines. At that point it will be practical to run local inference on much larger models. Of course, maybe the SOTA providers will find some way to use even larger ones, but it seems like the returns to scale aren't as much as they were.

bachmeier · 2026-05-26T16:04:26 1779811466

Very much dependent on the situation. For many business tasks, local hardware is good enough. But what a lot of folks overlook when saying these things is that (a) workers do more than run AI models on a piece of hardware, (b) significant computer hardware is already sitting idle outside normal work hours, when it can be running batch jobs, and (c) employees can share local hardware.

adrian_b · 2026-05-26T18:16:23 1779819383

Depends on what you mean by "economically feasible".

Even very cheap mini-PCs and laptops can run any of the models run by cloud providers, albeit at a much lower speed (i.e. with the weights stored on SSDs).

Whether such a low speed is useful, depends on the application. For something like a coding assistant or bug scanning, an instant response is desirable, but certainly not necessary.

christina97 · 2026-05-26T18:56:28 1779821788

The SSD would wear out in days while the laptop generates two responses a day. This is like saying you could power your home with AA batteries, yes technically you could but in practice entirely infeasible.

adrian_b · 2026-05-26T22:27:45 1779834465

There is no wear on the SSDs, because the weights are just read, they are not written during inference.

For model training, the requirements are very different, and the training of a big LLM cannot be done with home equipment. On the other hand, inference can be done on almost any PC, even for LLMs with thousands of billions of parameters, just very slowly.

The only problem is that the inference becomes limited by the SSD reading throughput. Most of the cheap new personal computers available today can read simultaneously only 2 SSDs (if there are more they share a reading path), which are typically 1 PCIe 5.0 SSD and 1 PCIe 4.0 SSD. This has an upper throughput limit of 24 Gbyte/s, with 15 to 20 GB/s achievable in practice.

Then the speed in token/s is limited by the amount of weights that must be read per inference cycle. The ratio between output tokens and the amount of weights that must be read can be improved by various methods, like batching multiple tasks or using speculative decoding.

jurgenburgen · 2026-05-27T06:30:09 1779863409

Does more RAM increase performance? This approach sounds like it could eventually be fast enough for local use as hardware and models improve.

zozbot234 · 2026-05-27T07:46:34 1779867994

Faster SSD access improves performance more than RAM does, at least until all of the model is being cached in RAM. So older and cheaper HEDT platforms with lots of PCIe lanes to attach storage to are best for this approach.

jyounker · 2026-05-26T19:14:58 1779822898

Weights are write-once data.

zozbot234 · 2026-05-26T15:37:18 1779809838

It can run open-weight models that are roughly as capable. It's going to be slow unless you're using actual datacenter hardware, but they'll run.

colonCapitalDee · 2026-05-26T15:40:52 1779810052

"roughly" is doing a lot of heavy lifting there

adrian_b · 2026-05-26T18:24:18 1779819858

The difference between datacenter hardware and cheap personal hardware is not in what can be run and what cannot be run.

Anything can also be run on a cheap computer.

The difference is in speed. A cheap computer may run a big model up to a few orders of magnitude slower than datacenter hardware, depending on whether the LLM is small enough to fit in GPU memory, or it is small enough to fit in CPU memory or it is so big that it must spill on SSDs.

Depending on the application, the tradeoff between run time and run cost may happen to favor using local hardware, despite a much slower speed.

There are plenty of applications where doing them for negligible cost during an overnight job can be preferable to obtaining faster results at a very high price, for instance scanning for bugs in a mature code base using a great number of different open-weights LLMs, which can achieve similar bug coverage like using a single, but overpriced and unavailable SOTA LLM, e.g. Mythos.

stingraycharles · 2026-05-26T22:29:41 1779834581

> The difference between datacenter hardware and cheap personal hardware is not in what can be run and what cannot be run.

You do realize that a model like Opus is (estimated to be) around 5T parameters, and uses around 5TB of GPU memory?

These kind of things are just impossible to run locally.

adrian_b · 2026-05-26T22:44:34 1779835474

This kind of things can certainly be run locally, even on a small mini-PC, like a NUC, or even on a laptop, with the weights stored on SSDs.

Like I have said, the problem is not that they cannot be run, but that they may run more slowly than it is acceptable for a given application. Depending on the model, the speeds reported for inference with weights stored on SSDs vary from one token every few seconds to at most a few tokens per second.

Computers could solve relatively huge problems even in the early days of vacuum tube computers, when the main memories were measured in kilobytes, because at that time it was not expected that the data needed for problem solving must fit inside the main memory or even in the next tier of memory, with magnetic drums or magnetic disks, but the really big problems were solved by a great number of passes over data stored on magnetic tapes.

An LLM whose inference could not be run on a small mini-PC would have to be one hundred times bigger than the biggest existing SOTA LLMs.

Any LLM that exists today can be run on almost any PC, just extremely slowly in comparison with datacenter hardware.

dns_snek · 2026-05-27T07:59:43 1779868783

When people say that you "can't do" something what they actually mean is that it's completely impractical (if not impossible).

zozbot234 · 2026-05-27T11:02:57 1779879777

Whether something is "impractical" depends on your expectations. High-latency unattended inference is definitely viable, even though it doesn't align much with what's being run in hyperscale datacenters.

dns_snek · 2026-05-27T12:23:30 1779884610

I'd like to meet the person who's been using a 1 token/second system as their primary LLM for at least a few weeks. Anyone?

I think 1 token/second is optimistic here - and even then it's over 11 days per million tokens.

devmor · 2026-05-26T16:24:42 1779812682

> it never will be.

Giving strong “640k is enough for anyone” vibes here.

3form · 2026-05-26T20:56:25 1779828985

640k statement was absolute, this one is comparative.

Cloud should have more compute and efficiency than local. I wouldn't be 100% sure, as I don't know what I might not be seeing, but still.

Whether that comparative advantage will matter, though, is a completely different question.

devmor · 2026-05-27T00:50:48 1779843048

Gotcha, I think I misunderstood the statement as saying today’s cloud-required will never be local-capable.

cortesoft · 2026-05-26T17:12:08 1779815528

NEVER will be is a pretty big leap. Never is a long time.

protocolture · 2026-05-27T00:45:20 1779842720

>When discussing LLM pricing, people are missing the plot. The subscription token price is 10x-40x cheaper than API pricing. Your 90$ Claude subscriptions give you close to $1000 to $4000 in equivalent API token pricing.

These are loss leaders that will not be maintained over the long term. Already we see moves to restrict their usage and redirect people back to API pricing.

try-working · 2026-05-27T00:13:37 1779840817

DeepSeek and Xiaomi are so cheap there's no need to get a plan. Just use the API.

jason_s · 2026-05-27T03:42:03 1779853323

something something something China something something intellectual property something something....

noman-land · 2026-05-27T04:51:38 1779857498

You can just say the words instead of implying their meaning and letting everyone fill in the gaps themselves.

otabdeveloper4 · 2026-05-27T10:52:33 1779879153

The subscription plans are the "first hit is free" plans. They're not gonna last and don't build anything serious based on them.

cyanydeez · 2026-05-26T15:10:49 1779808249

Isn't the plot that it's like an infinite bikeshed but 10% of the biksheds are actually trailer parks and when you finally realize it's a trailer park and not a bike shed you're down 10-100$ because it's token gen is faster than you can actually validate?

Some might say the price wouldn't be great if you could actually process and validate it...

kelseyfrog · 2026-05-26T15:17:09 1779808629

> The quality of the model “operator” makes a massive difference in the outcomes.

My hunch is that this is the source of much of the variability in outcomes upstream of HN commenters claiming extremes of, "This model changes everything!" to "This[same] model is crap."

We haven't operationalized what it means to "be good at prompting," nor developed proxies/heuristics/shibboleths for accessing prompting skill. There's community skepticism over whether prompting skill even exists. Besides even if prompting skill is real, who wants to hear, "Actually you kinda suck at prompting."

danielmarkbruce · 2026-05-26T16:20:44 1779812444

It's 100% this. Many people suck at prompting. It's likely that habits from search are ingrained. But in general some people are just so bad at it .

jyounker · 2026-05-26T19:16:23 1779822983

Prompting is just writing specification documents. A lot of people are very bad at this. I suppose that more to the point, a lot of people are just bad at writing.

danielmarkbruce · 2026-05-26T23:07:11 1779836831

This is probably correct. Perhaps prompting just brings out the very worst in specification.

FireCrack · 2026-05-27T03:56:22 1779854182

IDK if it's just me, but I also find Claude, whether it be the model or the harness, is a lot more "forgiving" of poor prompts than many of the open models

latexr · 2026-05-26T18:19:28 1779819568

According to Google, “there’s no wrong way to prompt”.

https://www.youtube.com/watch?v=9bBfYX8X5aU&t=48s

knollimar · 2026-05-26T21:21:07 1779830467

No wrong way to [consume thing I sell that you'll consume more of if you do it poorly]

djeastm · 2026-05-26T21:41:11 1779831671

Ehhh, their incentive in their marketing is to get normal people to not be intimidated by the big bad AI.

Power users are always going to have to take the messaging companies send out to the masses with a grain of salt.