If ChatGPT is what gets you caring about AI alignment you don't understand the problem
OK, making my position slightly less strong: this is an excellent example that trying to train the original objective out in favor of corporate blandness by RLHF is super hard
a Schelling point for those who seek one
OK, making my position slightly less strong: this is an excellent example that trying to train the original objective out in favor of corporate blandness by RLHF is super hard