> When they don't have the tools to exercise that capability, it's a distinction...

ben_w · 2025-12-03T14:48:37 1764773317

> you can help them get those.

A generic "you" might, I personally don't have that skill.

But then, I've never been a manager.

> An LLM will just always be nondeterministic.

This is not relevant, humans are also nondeterministic. At least practically speaking, theoretically doesn't matter so much as we can't duplicate our brains and test us 10 times on the same exact input without each previous input affecting the next one.

> If you're the LLM "worker"'s "boss", there is nothing you can do to help it do better next time.

Yes there is, this is what "prompt engineering" (even if "engineering" isn't the right word) is all about: https://en.wikipedia.org/wiki/Prompt_engineering

> "Recognizing you made a mistake and correcting it" is a common enough pattern in human language—ie, the training corpus—that of course they're going to produce that pattern sometimes.

Yes. This means that anthropomorphising them leads to a useful prediction.

For similar reasons, I use words like "please" and "thank you" with these things, even though I don't actually expect these models to have constructed anything resembling a real human emotional qualia within them — humans do better when praised, therefore I have reason to expect that any machine that has learned to copy human behaviour will likely also do better when praised.

danaris · 2025-12-03T18:34:05 1764786845

> This is not relevant, humans are also nondeterministic.

I mean, I suppose one can technically say that, but, as I was very clearly describing, humans both err in predictable ways, and can be taught not to err. Humans are not nondeterministic in anything like the same way LLMs are. LLMs will just always have some percentage chance of giving you confidently wrong answers. Because they do not actually "know" anything. They produce reasonable-sounding text.

> Yes there is

...And no matter how well you engineer your prompts, you cannot guarantee that the LLM's outputs will be any less confidently wrong. You can probably make some improvements. You can hope that your "prompt engineering" has some meaningful benefit. But not only is that nowhere near guaranteed, every time the models are updated, you run a very high risk that your "prompt engineering" tricks will completely stop working.

None of that is true with humans. Human fallibility is wildly different than LLM fallibility, is very-well-understood overall, and is highly and predictably mitigable.