**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:39

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:39

niplav @niplav@schelling.pt

Feb 20, 2023, 22:39

Hm. I think the type of philosophy/math/cs needed for successful strawberry alignment is close enough to regular theorem-proving that AI systems that aren't seeds for worldcrunchers would still be very helpful.

(Doesn't feel to me like it touches the consequentialist core of cognition, a lot of philosophy is tree-traversal and finding inconsistent options, and math also feels like a MCTS-like thing)

Is the advantage we'd have by good alignment theorist ML systems 1.5x or 10x or 100x?

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:41

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:41

Feb 20, 2023, 22:41

niplav @niplav@schelling.pt

If we had those widely distributed, people would likely use them for capabilities and just widen the gap (e.g. OpenAI who talk about this as a strategy are not to be trusted with that strategy, since I don't see them using it solely for alignment work for half a year, and instead using it on both capabilities and alignment. But their plan is sound in that regard).

But I disagree with the view that you can't have the alignment theorist that is not also a consequentialist.

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:45

**niplav** @niplav@schelling.pt · Feb 20, 2023, 22:45

Feb 20, 2023, 22:45

niplav @niplav@schelling.pt

Even with ML systems!

I agree that probably with most architectures, if you train them a lot to be capable alignment theorists, they have inner optimizers that are capable consequentialists, but the alignment-theorist-phase might be quite long (I could_{10%} see it going over 100x human ability).

**niplav** @niplav@schelling.pt · 2023-02-20T22:46:20Z

niplav @niplav@schelling.pt

The, ah, fifth thing I disagree with Eliezer about.

Feb 20, 2023, 22:46 · · · ·

Trending now

Resources

Developers

What is Mastodon?

schelling.pt

More…