@mjk Vigilantbrain notices your queries aren't identical, but typing the GPT query into Google indeed doesn't do much to help things.
@deech FWIW, I got good mileage out of hissing at them when they did something I didn't want them to. Speak to them in their own language.
AI, language models, understanding & programming
@typeswitch Re-reading your initial tweets, it sounds like we simply disagree about that, which is fine. But if you can pinpoint which part in particular the disagreement hinges on I'd be interested.
AI, language models, understanding & programming
@typeswitch It's perfectly possible to represent the concept of 2 with just symbols (two dots, vs. just one dot, or whatever).
Likewise English sentences about these don't seem to need any referents that aren't available; understanding them would at a core level boil down to understanding their logical content.
AI, language models, understanding & programming
Our concepts, symbols, and manipulations in math/PL are basically abstracted from the way things work in the physical world, e.g. what the number 2 means, what it means to have two of something.
If you *start out* at that level of abstraction with just symbols, obviously you can't project back down to the real world, but as long as you don't need to, it should be fine.
AI, language models, understanding & programming
@typeswitch (To be clear, I'm not making any claim about what any actually existing models actually do or don't understand.)
Re-stating, the claim is that an LLM trained on this kind of corpus could, potentially, gain some understanding of math and programming and English sentences involving them.
I have no idea what it could "understand" about things with real-world referents, like cats or Chicago, but math & PL don't need much if any of that.
AI, language models, understanding & programming
So _in principle_ I think it _could_ be capable of understanding semantics, as it is. Whether it _does_, in the absence of advanced mind reading techniques, we can only try to infer empirically. (I'd guess general impressions are that it doesn't, but haven't looked closely.)
Of course I agree that a model deliberately trained that way would be more likely to attain understanding (slash likely to attain deeper/better understanding).
AI, language models, understanding & programming
@typeswitch Hmm, it does seem like a stretch to infer semantics purely from source code, without any access to e.g. outputs or specs. Probably there is _some_ of that though in the form of comments and tests. And idk how much docs it gets.
From quick googling it also seems like it's based on top of GPT3, whose training data may contain all that other kind of stuff (e.g. articles on PL semantics) as well.
AI, language models, understanding & programming
> so, it predicts words sequentially.
it does do that
> you can't feed a brain ... embodied living experience
that's a super interesting question and I'm inclined to agree with you, albeit confidence is low and getting lower
programming and math are among the things I'd expect it's *most* possible to understand without that, though
...which you also say in toot #2, but seemingly the opposite in toot #1; could you explain?
AI, language models, understanding & programming
And models are "trained" -- that is, semi-randomly and iteratively selected from among all possible models -- to be good at prediction, but once again, how exactly they end up accomplishing that is opaque to us. All those billions of weights could in principle be encoding any number of different algorithms.
More on this if you're interested:
https://benlevinstein.substack.com/p/how-to-think-about-large-language
https://benlevinstein.substack.com/p/whats-going-on-under-the-hood-of#%C2%A7representing-truth
AI, language models, understanding & programming
> attention models are incapable of understanding, all they can do is predict words sequentially
It's a widespread misunderstanding, but this is not correct! It confuses interface with implementation, incentive with strategy.
On the outside, a model predicts one word at a time. On the inside, no one really knows what goes on. For all we know, it could be e.g. deciding between candidate sentences, before returning only the next word.
Number theorists love mutant versions of the integers with extra numbers thrown in. For example, 'Gaussian integers' are just numbers like
a + bi
where a,b are integers and i = √−1. It's fun to think about primes in the Gaussian integers. Some ordinary primes stay prime in the Gaussian integers, like 3. But others don't, like
2 = (1 - i)(1 + i)
and
5 = (2 + i)(2 - i)
Cool fact: an odd prime stays prime in the Gaussian integers if and only if it's 1 less than a multiple of 4.
(1/n)
@zwarich may be of interest: https://gist.github.com/glaebhoerl/d62d2b19365ae0d7c29102d0a5a6ab03
@JacquesC2 I'm not sure I fully grok your point, but it sounds like you may find https://benlevinstein.substack.com/p/whats-going-on-under-the-hood-of#%C2%A7representing-truth interesting as well, if you haven't already seen it
@forrestthewoods Ah indeed, seems I got 1 and 2 mixed up there. Thanks!
Big question:
Do you have any executive summary level thoughts or impressions on how it compares to Zig, especially w.r.t. the compile-time evaluation features? I've read various blog posts about Zig but haven't used it.
The temporary allocator stuff is interesting... I was thinking that since there's only one, it sounds like a disaster waiting to happen when different parts of the code (especially once you have more than one module/library/etc) expect their temporary allocations to stay alive for different periods. Is there some general logic by which this is expected to be avoided? Does actually having multiple separate temporary allocators via contexts figure into it?
@forrestthewoods Thanks for writing this! I've been curious about Jai but it was really hard to find info on it (especially in textual form).
Small questions:
I assume if you copy a relative pointer somewhere else and then dereference it, it will just silently go wrong? Avoiding that in general would seem to require a full system of copy/move constructors etc. (or perhaps pinned types)