You've misunderstood what I'm saying. Regardless of whether LLMs think or not, the sentence "LLMs don't think because they predict the next token" is logically as wrong as "fleas can't jump because they have short legs".
> the sentence "LLMs don't think because they predict the next token" is logically as wrong
it isn't, depending on the deifinition of "THINK".
If you believe that thought is the process for where an agent with a world model, takes in input, analysies the circumstances and predicts an outcome and models their beaviour due to that prediction. Then the sentence of "LLMs dont think because they predict a token" is entirely correct.
They cannot have a world model, they could in some way be said to receive a sensory input through the prompt. But they are neither analysing that prompt against its own subjectivity, nor predicting outcomes, coming up with a plan or changing its action/response/behaviour due to it.
Any definition of "Think" that requieres agency or a world model (which as far as I know are all of them) would exclude an LLM by definition.
I think Anthropic has established that LLMs have at least a rudimentary world model (regions of tensors that represent concepts and relationships between them) and that they modify behavior due to a prediction (putting a word at the end of the second line of a poem based on the rhyme they need for the last). Maybe they come up short on 'analyzing the circumstances'; not really sure how to define that in a way that is not trivial.
This may not be enough to convince you that they do think. It hasn't convinced me either. But I don't think your confident assertions that they don't are borne out by any evidence. We really don't know how these things tick (otherwise we could reimplement their matrices in code and save $$$).
If you put a person in charge of predicting which direction a fish will be facing in 5 minutes, they'll need to produce a mental model of how the fish thinks in order to be any good at it. Even though their output will just be N/E/S/W, they'll need to keep track internally of how hungry or tired the fish is. Or maybe they just memorize a daily routine and repeat it. The open question is what needs to be internalized in order to predict ~all human text with a low error rate. The fact that the task is 'predict next token' doesn't tell us very much at all about the internals. The resulting weights are uninterpretable. We really don't know what they're doing, and there's no fundamental reason it can't be 'thinking', for any definition.
> I think Anthropic has established that LLMs have at least a rudimentary world model
its unsurprising that a company heavily invested in LLMs would describe clustered information as a world model, but it isnt. Transformer models, for video or text LLMs dont have the kind of stuff you would need to have a world model. They can mimic some level of consistency as long as the context window holds, but that disappears the second the information leaves that space.
In terms of human cognition it would be like the difference between short term memory, long term memory and being able to see the stuff in front of you. A human can instinctively know the relative weight, direction and size of objects and if a ball rolls behind a chair you still know its there 3 days later. A transformer model cannot do any of those things and at best can remember the ball behind the chair until enough information comes in to push it out of the context window at which point it can not reapper.
> putting a word at the end of the second line of a poem based on the rhyme they need for the last)
that is the kind of work that exists inside its conext window. Feed it a 400 page book, which any human could easily read, digest, parse and understand and make it do a single read and ask questions about different chapters. You will quickly see it make shit up that fits the information given previously and not the original text.
> We really don't know how these things tick
I don't know enough about the universe either. But if you told me that there are particles smaller than plank length and others that went faster than the speed of light then I would tell you that it cannot happen due to the basic laws of the universe. (I know there are studies on FTL neutrinos and dark matter but in general terms, if you said you saw carbon going FTL I wouldnt believe you).
Similarly, Transformer models are cool, emergent properties are super interesting to study in larger data sets. Adding tools to the side for deterministic work helps a lot, agenctic multi modal use is fun. But a transformer does not and cannot have a world model as we understand it, Yann Lecunn left facebook because he wants to work on world model AIs rather than transformer models.
> If you put a person in charge of predicting which direction a fish will be facing in 5 minutes,
what that human will never do is think the fish is gone because he went inside the castle and he lost sight of it. Something a transformer would.
Anthropic may or may not have claimed this was evidence of a world model; I'm not sure. I say this is a world model because it is a objectively a model of the world. If your concept of a world model requires something else, the answer is that we don't know whether they're doing that.
Long-term memory and object permanence don't seem necessary for thought. A 1-year-old can think, as can a late-stage Alzheimers patient. Neither could get through a 400-page book, but that's irrelevant.
Listing human capabilities that LLMs don't have doesn't help unless you demonstrate these are prerequisites for thought. Helen Keller couldn't tell you the weight, direction, or size of a rolling ball, but this is not relevant to the question of whether she could think.
Can you point to the speed-of-light analogy laws that constrain how LLMs work in a way that excludes the possibility of thought?
> I say this is a world model because it is a objectively a model of the world.
a world model in AI has specific definition, which is an internal representation that the AI can use to understand and simulate its environment.
> Long-term memory and object permanence don't seem necessary for thought. A 1-year-old can think, as can a late-stage Alzheimers patient
Both those cases have long term memory and object permanence, they also have a developing memory or memory issues. But the issues are not constrained by their context window. Children develop object permance in the first 8 months, and similar to distinguishing between their own body and their mothers that is them developing a world model. Toddlers are not really thinking, they are responding to stimulus, they feel huger they cry. They hear a loud sound they cry. Its not really them coming up with a plan to get fed or attention
> Listing human capabilities that LLMs don't have doesn't help unless you demonstrate these are prerequisites for thought. Helen Keller couldn't tell you the weight, direction, or size of a rolling ball
Helen Keller had understanding in her mind of what different objects were, she started communicating because she understood the word water with her teacher running her finger through her palm.
Most humans have multiple sensory inputs (sight, smell, hearing, touch) she only had one which is perhaps closer to an LLM. But conditions she had that LLMs dont have are agency, planning, long term memory etc.
> Can you point to the speed-of-light analogy laws that constrain how LLMs work in a way that excludes the possibility of thought?
Sure, let me switch the analogy if you dont mind. In the chinese room thought experiment we have a man who gets a message and opens a chinese dictionary and translates it perfectly word by word and the person on the other side receives and read a perfect chinese message.
The argument usually goes along the idea of whether the person inside the room "understands" chinese if he is capable of creating 1:1 perfect chinese messages out.
But an LLM is that man, what you cannot argue is that the man is THINKING. He is mechanically going to the dictionary and returning a message that can pass as human written because the book is accurate (if the vectors and weights are well tuned). He is neither an agent, he simply does, and he is not crating a plan or doing anything beyond transcribing the message as the book demands.
He doesnt have a mental model of the chinese language, he cannot formulate his own ideas or execute a plan based on predicted outcomes, he cannot do but perform the job perfectly and boringly as per the book.
We absolutely do, we know exactly how LLMs work. They generate plausible text from a corpus. They don't accurately reproduce data/text, don't think, they don't have a world view or a world model, and they sometimes generate plausible yet incorrect data.
How do they generate the text? Because to me it sounds like "we know how humans work, they make sounds with their mouths, they don't think, have a model of the world..."