It really doesn't matter how "good" these tools feel, or whatever vague metric y...

df2dd · 2026-02-25T01:14:30 1771982070

Indeed. Many of the posts I see on here are hilarious.

Have any of you tried re-producing an identical output, given an identical set of inputs? It simply doesn't happen. Its like a lottery.

This lack of reproducibility is a huge problem and limits how far the thing can go.

tvbusy · 2026-02-25T07:08:46 1772003326

LLMs have randomness baked into every single token it generates. You can try running LLMs locally and set the temperature to low and it immediately feels boring to always have the same reply every time. It's the randomness that makes them feel "smart". Put it another way, randomness is required for the illusion of intelligence.

df2dd · 2026-02-25T13:46:35 1772027195

Im fully aware of that. However, this illusion is a dangerous mirage. It doesnt equate to reality. In some cases thats OK. But in most cases its not, especially so in the context of business operations.

tibbar · 2026-02-25T07:05:07 1772003107

Determinism in agents is a complex topic because there are several different layers of abstraction, each of which may introduce its own non-determinism. But yeah, it is going to be difficult to induce determinism in a commercial coding agent, for reasons discussed below.

However, we can start by claiming that non-determinism is not necessarily a bad thing - non-greedy token sampling helps prevent certain degenerate/repetitive states and tends to produce overall higher quality responses [0]. I would also observe that part of the yin-yang of working with the agents is letting go of the idea that one is working with a "compiler" and thinking of it more as a promising but fallible collaborator.

With that out of the way, what leads to non-determinism? The classic explanation is the sampling strategy used to select the next token from the LLM. As mentioned above, there are incentives to use a non-zero temperature for this, which means that most LLM APIs are intentionally non-deterministic by default. And, even at temperature zero LLMs are not 100% deterministic [1]. But it's usually pretty close; I am running a local LLM as we speak with greedy sampling and the result is predictably the same each time.

Proprietary reasoning models are another layer of abstraction that may not even offer temperature as knob anymore[2]. I think Claude still offers it, but it doesn't guarantee 100% determinism at temperature 0 either. [3]

Finally, an agentic tool loop may encounter different results from run to run via tool calls -- it's pretty hard to force a truly reproducible environment from run to run.

So, yeah, at best you could get something that is "mostly" deterministic if you coded up your own coding agent that focused on using models that support temperature and always forced it to zero, while carefully ensuring that your environment has not changed from run to run. And this would, unfortunately, probably produce worse output than a non-deterministic model.

[0] https://arxiv.org/abs/2007.14966 [1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in... [2] https://learn.microsoft.com/en-us/azure/ai-foundry/openai/ho... [3] https://platform.claude.com/docs/en/about-claude/glossary

df2dd · 2026-02-25T13:59:14 1772027954

Appreciate the response. I agree that non-determinism isnt a bad thing. However LLMs are being pushed as the thing to replace much of the deterministic things that exist in the world - and anyone seen to be thinking otherwise gets punished e.g. in the stock market.

This world of extremes is annoying for people who have the ability to think more broadly and see a world where deterministic systems and non-deterministic systems can work together, where it makes sense.

tibbar · 2026-02-25T21:58:27 1772056707

Yeah, I think you're right that LLMs are overused. In most cases where a deterministic system is feasible and desirable, it's also much faster and cheaper than using an LLM, too..

nfg · 2026-02-24T21:00:48 1771966848

> In other words, that usage you like is costing them tons of money

Evidence? I’m sure someone will argue, but I think it’s generally accepted that inference can be done profitably at this point. The cost for equivalent capability is also plummeting.

JohnMakin · 2026-02-24T21:04:39 1771967079

I didn't think there would need to be more evidence than the fact they are saying they need to spend $600 billion in 4 years on $13bn revenue currently, but here we are.

Here you go: https://www.wsj.com/livecoverage/stock-market-today-dow-sp-5...

tibbar · 2026-02-24T21:12:06 1771967526

Right, but if OpenAI wanted to stop doing research and just monetize its current models, all indications are that it would be profitable. If not, various adjustments to pricing/ads/ etc could get it there. However, it has no reason to do this, and like all the other labs is going insanely into debt to develop more models. I'm not saying that it's necessarily going to work out, but they're far from the first company to prioritize growth over profitability

mike_hearn · 2026-02-25T09:07:59 1772010479

This meme needs to go in the bin. Loss making companies love inventing strange new accounting metrics, which is one reason public companies are forced to report in standardized ways.

There's no such thing as "profitable inference". A company is either profitable or it isn't.

Let's for a second assume all the labs somehow manage to form a secret OPEC-style cartel that agrees to slow training to a halt, and nobody notices or investigates. This is already hard to imagine with the amount of scrutiny they're under and given that China views this as a military priority. But let's pretend they manage it. These firms also have lots of other costs:

• Staffing and comp! That's huge!

• User subsidies to allow flat rate plans

• Support (including abuse control and handling the escalations from their support bots)

• Marketing

• Legal fees and data licensing

• Corporate/enterprise sales, which is expensive as hell even though it's often worth it

• Debt servicing (!!)

• Generating returns for investors

Inferencing margins have to cover all of those, even if progress stops tomorrow and the RoI to investors has to be likewise very large, so margins can't be trivial. Yet what these firms have said about their margins is very ambiguous. As they're arriving at this statement by excluding major cost components like training, it's not clear what they think the cost of inferencing actually is. Are they excluding other things too like hw depreciation and upgrades? Are they excluding the cost of the corporate sales/support infrastructure around the inferencing?

tibbar · 2026-02-25T16:37:53 1772037473

To be clear, it's absolutely impossible for OpenAI and the others to stop. The valuation and honestly the global markets depend on them staying leveraged to the hilt. So they're not going to stop. However, the point is that the models are genuinely useful and people pay for them, and if we reset the timeline with a company that has just the current proprietary models, they could turn a profit. That might involve charging more than they do now, etc. But this is much different than OpenAI, specifically, trying to turn a profit today, which wouldn't work for many reasons.

But also, "profitable inference" IS a thing! "Gross margin" is important and meaningful, even if a company has other obligations that mean it's overall not profitable.

zippothrowaway · 2026-02-24T21:51:49 1771969909

Nope. The only "all indications" are that they say so. They may be making a profit on API usage, but even that is very suspect - compare against how much it actually costs to rent a rack of B200s from Microsoft. But for the millions of people using Codex/Claude Code/Copilot, the costs of $20-$30-$200 clearly don't compare to the actual cost of inference.