Language is probably less than you think it is

gapingvoid cartoon showing information, knowledge, wisdom, etc.

This is a great post by Jennifer Moore, whose main point is about using AI for software development, but along the way provide three paragraphs which get to the nub of why tools such as ChatGPT seem somewhat magical.

As Moore points out, large language models aren’t aware. They model things based on statistical probability. To my mind, it’s not so different than when my daughter was doing phonics and learning to recognise the construction of words and the probability of how words new to her would be spelled.

ChatGPT and the like are powered by large language models. Linguistics is certainly an interesting field, and we can learn a lot about ourselves and each other by studying it. But language itself is probably less than you think it is. Language is not comprehension, for example. It’s not feeling, or intent, or awareness. It’s just a system for communication. Our common lived experiences give us lots of examples that anything which can respond to and produce common language in a sensible-enough way must be intelligent. But that’s because only other people have ever been able to do that before. It’s actually an incredible leap to assume, based on nothing else, that a machine which does the same thing is also intelligent. It’s much more reasonable to question whether the link we assume exists between language and intelligence actually exists. Certainly, we should wonder if the two are as tightly coupled as we thought.

That coupling seems even more improbable when you consider what a language model does, and—more importantly—doesn’t consist of. A language model is a statistical model of probability relationships between linguistic tokens. It’s not quite this simple, but those tokens can be thought of as words. They might also be multi-word constructs, like names or idioms. You might find “raining cats and dogs” in a large language model, for instance. But you also might not. The model might reproduce that idiom based on probability factors instead. The relationships between these tokens span a large number of parameters. In fact, that’s much of what’s being referenced when we call a model large. Those parameters represent grammar rules, stylistic patterns, and literally millions of other things.

What those parameters don’t represent is anything like knowledge or understanding. That’s just not what LLMs do. The model doesn’t know what those tokens mean. I want to say it only knows how they’re used, but even that is over stating the case, because it doesn’t know things. It models how those tokens are used. When the model works on a token like “Jennifer”, there are parameters and classifications that capture what we would recognize as things like the fact that it’s a name, it has a degree of formality, it’s feminine coded, it’s common, and so on. But the model doesn’t know, or understand, or comprehend anything about that data any more than a spreadsheet containing the same information would understand it.

Source: Jennifer++

Image: gapingvoid