Hm. I think the type of philosophy/math/cs needed for successful strawberry alignment is close enough to regular theorem-proving that AI systems that aren't seeds for worldcrunchers would still be very helpful.
(Doesn't feel to me like it touches the consequentialist core of cognition, a lot of philosophy is tree-traversal and finding inconsistent options, and math also feels like a MCTS-like thing)
Is the advantage we'd have by good alignment theorist ML systems 1.5x or 10x or 100x?
I think the philosophy/math/cs system would be just as capable at capabilities work as at alignment work.
But I now remember an old idea of making STEMGPT, trained (in the weak case) only on STEM textbooks, arXiv, (in the strong case only on) hadron collider data, protein structures, meteorological and geological data &c. Hard to have info about humans leak over though.
How much of strawberry alignment is value-laden? 5%? 95%? probably further along some logarithmic scale, if I had to bet
more towards zero than one (so <<5%). Obviously the value-laden part wouldn't be solved.