Hm. I think the type of philosophy/math/cs needed for successful strawberry alignment is close enough to regular theorem-proving that AI systems that aren't seeds for worldcrunchers would still be very helpful.
(Doesn't feel to me like it touches the consequentialist core of cognition, a lot of philosophy is tree-traversal and finding inconsistent options, and math also feels like a MCTS-like thing)
Is the advantage we'd have by good alignment theorist ML systems 1.5x or 10x or 100x?
Even with ML systems!
I agree that probably with most architectures, if you train them a lot to be capable alignment theorists, they have inner optimizers that are capable consequentialists, but the alignment-theorist-phase might be quite long (I could_{10%} see it going over 100x human ability).
The, ah, fifth thing I disagree with Eliezer about.