Follow

10 years ago a lot of AI Safety discussion turned on distilling humanity's meta-ethics into a machine-readable form. Today our most impressive AI's approximately reflect all the human content we could find for them, encoded in a semantically-meaningful way. We can convey intuitive preferences to the machines now. We can't guarantee that they'll actually optimize on those preferences, but the fact that the concepts are available seems under-discussed.

@jai From the early days, the plan was "figure out what human values are, and optimize for them". The problem has always been (1) how to encode that sentiment, and (2) how to build an AGI that reliably optimizes for *anything at all*, even eg maximising diamond.

@ciphergoth (1) seems much more tractable now than I would have expected prior to the advent of LLMs. Old LW lore often invoked evil genies who would follow the letter but not spirit of the utility function. It seems like that problem is ~approaching solved. (2) still looms large, of course - but if we've stumbled into almost solving the evil genie subproblem, that seems worth celebrating.

@jai The problem was never that the AIs wouldn't understand human values; it was always that we didn't have a good way to point at that understanding.

Sign in to participate in the conversation
Mastodon

a Schelling point for those who seek one