AI, language models, understanding & programming
attention models are incapable of understanding, all they can do is predict words sequentially; this is fundamentally different to the kind of thinking that humans do.
it's also fundamentally different to how a CPU works when running a program, which is to say, these models are also not very good at programming. they can predict how a programmer might write some code, but they cannot understand what is happening at the semantic level.
in some ways, they're not too different from a novice programmer who has memorized what a program looks like but has no understanding of its underlying semantics. the only difference is that these models have a much larger capacity for memorization, and a much smaller capacity for understanding.
AI, language models, understanding & programming
> attention models are incapable of understanding, all they can do is predict words sequentially
It's a widespread misunderstanding, but this is not correct! It confuses interface with implementation, incentive with strategy.
On the outside, a model predicts one word at a time. On the inside, no one really knows what goes on. For all we know, it could be e.g. deciding between candidate sentences, before returning only the next word.
AI, language models, understanding & programming
And models are "trained" -- that is, semi-randomly and iteratively selected from among all possible models -- to be good at prediction, but once again, how exactly they end up accomplishing that is opaque to us. All those billions of weights could in principle be encoding any number of different algorithms.
More on this if you're interested:
https://benlevinstein.substack.com/p/how-to-think-about-large-language
https://benlevinstein.substack.com/p/whats-going-on-under-the-hood-of#%C2%A7representing-truth
AI, language models, understanding & programming
AI, language models, understanding & programming
@glaebhoerl to be clear, the issue with these models isn't the "sequential" part, it's the "words" part.
you can't feed a brain on words alone & expect it to develop human understanding & empathy. a quadrillion words aren't going to make up for the lack of embodied living experience.
AI, language models, understanding & programming
> so, it predicts words sequentially.
it does do that
> you can't feed a brain ... embodied living experience
that's a super interesting question and I'm inclined to agree with you, albeit confidence is low and getting lower
programming and math are among the things I'd expect it's *most* possible to understand without that, though
...which you also say in toot #2, but seemingly the opposite in toot #1; could you explain?
AI, language models, understanding & programming
@typeswitch Hmm, it does seem like a stretch to infer semantics purely from source code, without any access to e.g. outputs or specs. Probably there is _some_ of that though in the form of comments and tests. And idk how much docs it gets.
From quick googling it also seems like it's based on top of GPT3, whose training data may contain all that other kind of stuff (e.g. articles on PL semantics) as well.
AI, language models, understanding & programming
@glaebhoerl I don't agree that it could in principle understand PL semantics. That rests on the assumption that it understands human language, which I think is a much wilder claim & one I don't believe (see rest of thread).
Whereas things like IDEs and compilers do understand PL semantics to some extent (e.g. what is being defined & where), enough to be useful. I think it's a matter of time until someone figures out how to combine that kind of understand with machine learning in a useful way.
AI, language models, understanding & programming
@typeswitch (To be clear, I'm not making any claim about what any actually existing models actually do or don't understand.)
Re-stating, the claim is that an LLM trained on this kind of corpus could, potentially, gain some understanding of math and programming and English sentences involving them.
I have no idea what it could "understand" about things with real-world referents, like cats or Chicago, but math & PL don't need much if any of that.
AI, language models, understanding & programming
Our concepts, symbols, and manipulations in math/PL are basically abstracted from the way things work in the physical world, e.g. what the number 2 means, what it means to have two of something.
If you *start out* at that level of abstraction with just symbols, obviously you can't project back down to the real world, but as long as you don't need to, it should be fine.
AI, language models, understanding & programming
@typeswitch It's perfectly possible to represent the concept of 2 with just symbols (two dots, vs. just one dot, or whatever).
Likewise English sentences about these don't seem to need any referents that aren't available; understanding them would at a core level boil down to understanding their logical content.
AI, language models, understanding & programming
@typeswitch Re-reading your initial tweets, it sounds like we simply disagree about that, which is fine. But if you can pinpoint which part in particular the disagreement hinges on I'd be interested.
AI, language models, understanding & programming
@typeswitch
So _in principle_ I think it _could_ be capable of understanding semantics, as it is. Whether it _does_, in the absence of advanced mind reading techniques, we can only try to infer empirically. (I'd guess general impressions are that it doesn't, but haven't looked closely.)
Of course I agree that a model deliberately trained that way would be more likely to attain understanding (slash likely to attain deeper/better understanding).