10 years ago a lot of AI Safety discussion turned on distilling humanity's meta-ethics into a machine-readable form. Today our most impressive AI's approximately reflect all the human content we could find for them, encoded in a semantically-meaningful way. We can convey intuitive preferences to the machines now. We can't guarantee that they'll actually optimize on those preferences, but the fact that the concepts are available seems under-discussed.
@jai From the early days, the plan was "figure out what human values are, and optimize for them". The problem has always been (1) how to encode that sentiment, and (2) how to build an AGI that reliably optimizes for *anything at all*, even eg maximising diamond.
@ciphergoth (1) seems much more tractable now than I would have expected prior to the advent of LLMs. Old LW lore often invoked evil genies who would follow the letter but not spirit of the utility function. It seems like that problem is ~approaching solved. (2) still looms large, of course - but if we've stumbled into almost solving the evil genie subproblem, that seems worth celebrating.
@ciphergoth (1) seems much more tractable now than I would have expected prior to the advent of LLMs. Old LW lore often invoked evil genies who would follow the letter but not spirit of the utility function. It seems like that problem is ~approaching solved. (2) still looms large, of course - but if we've stumbled into almost solving the evil genie subproblem, that seems worth celebrating.