But I now remember an old idea of making STEMGPT, trained (in the weak case) only on STEM textbooks, arXiv, (in the strong case only on) hadron collider data, protein structures, meteorological and geological data &c. Hard to have info about humans leak over though.

How much of strawberry alignment is value-laden? 5%? 95%? probably further along some logarithmic scale, if I had to bet

Show thread

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:46

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:46

Feb 20, 2023, 22:46

niplav @niplav@schelling.pt

The, ah, fifth thing I disagree with Eliezer about.

Show thread

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:45

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:45

Feb 20, 2023, 22:45

niplav @niplav@schelling.pt

Even with ML systems!

I agree that probably with most architectures, if you train them a lot to be capable alignment theorists, they have inner optimizers that are capable consequentialists, but the alignment-theorist-phase might be quite long (I could_{10%} see it going over 100x human ability).

Show thread

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:41

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:41

Feb 20, 2023, 22:41

niplav @niplav@schelling.pt

If we had those widely distributed, people would likely use them for capabilities and just widen the gap (e.g. OpenAI who talk about this as a strategy are not to be trusted with that strategy, since I don't see them using it solely for alignment work for half a year, and instead using it on both capabilities and alignment. But their plan is sound in that regard).

But I disagree with the view that you can't have the alignment theorist that is not also a consequentialist.

Show thread

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:39

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:39

Feb 20, 2023, 22:39

niplav @niplav@schelling.pt

Hm. I think the type of philosophy/math/cs needed for successful strawberry alignment is close enough to regular theorem-proving that AI systems that aren't seeds for worldcrunchers would still be very helpful.

(Doesn't feel to me like it touches the consequentialist core of cognition, a lot of philosophy is tree-traversal and finding inconsistent options, and math also feels like a MCTS-like thing)

Is the advantage we'd have by good alignment theorist ML systems 1.5x or 10x or 100x?

**niplav** @niplav@schelling.pt · Feb 20, 2023, 12:20

**niplav** @niplav@schelling.pt · Feb 20, 2023, 12:20

Feb 20, 2023, 12:20

niplav @niplav@schelling.pt

Telling my kidnappers about AI alignment until they gag me

**niplav** @niplav@schelling.pt · Feb 20, 2023, 12:15

**niplav** @niplav@schelling.pt · Feb 20, 2023, 12:15

Feb 20, 2023, 12:15

niplav @niplav@schelling.pt

Update: there's a bunch of women using the Replika thing.

I'd like to see the ratio

(95% confidence interval: [10%, 65%])

**niplav** @niplav@schelling.pt · Feb 20, 2023, 12:08

**niplav** @niplav@schelling.pt · Feb 20, 2023, 12:08

Feb 20, 2023, 12:08

niplav @niplav@schelling.pt

Hey @niconiconi did you write this: https://www.lesswrong.com/posts/mHqQxwKuzZS69CXX5/whole-brain-emulation-no-progress-on-c-elgans-after-10-years?

It's great

**niplav** @niplav@schelling.pt · Feb 20, 2023, 11:10

**niplav** @niplav@schelling.pt · Feb 20, 2023, 11:10

Feb 20, 2023, 11:10

niplav @niplav@schelling.pt

Man I do have a lot more respect for Oliver Habryka after listening to this[1]. Highlights include naming the thing where high status people eschew meritocracy because they can only lose, and the statement that there might be 5-10 years in the medium future that are about as crazy or crazier than 2020.

[1]: https://thefilancabinet.com/episodes/2023/02/05/6-oliver-habryka.html

**niplav** @niplav@schelling.pt · Feb 20, 2023, 11:05

**niplav** @niplav@schelling.pt · Feb 20, 2023, 11:05

Feb 20, 2023, 11:05

niplav @niplav@schelling.pt

Might've been in The Art of Unix Programming

Show thread

**niplav** @niplav@schelling.pt · Feb 20, 2023, 11:03

**niplav** @niplav@schelling.pt · Feb 20, 2023, 11:03

Feb 20, 2023, 11:03

niplav @niplav@schelling.pt

Hm, I remember reading somewhere sometime a classification of ways that you can use unix programs in pipes:

Sources (<, cat, programs that just produce output), filters (removing data, such as wc), transformers (?) (such as sort, cut, awk) and sinks (>, programs that just execute). Anyone recollect where I could've gotten that from?