**jai** @jai@schelling.pt · Dec 21, 2022

**jai** @jai@schelling.pt · Dec 21, 2022

jai @jai@schelling.pt

Dec 21, 2022

10 years ago a lot of AI Safety discussion turned on distilling humanity's meta-ethics into a machine-readable form. Today our most impressive AI's approximately reflect all the human content we could find for them, encoded in a semantically-meaningful way. We can convey intuitive preferences to the machines now. We can't guarantee that they'll actually optimize on those preferences, but the fact that the concepts are available seems under-discussed.

**Paul Crowley** @ciphergoth@schelling.pt · 2022-12-21T15:50:05Z

Paul Crowley @ciphergoth@schelling.pt

@jai From the early days, the plan was "figure out what human values are, and optimize for them". The problem has always been (1) how to encode that sentiment, and (2) how to build an AGI that reliably optimizes for *anything at all*, even eg maximising diamond.

December 21, 2022 at 3:50 PM · · · ·

**jai** @jai@schelling.pt · Dec 21, 2022

**jai** @jai@schelling.pt · Dec 21, 2022

Dec 21, 2022

jai @jai@schelling.pt

@ciphergoth (1) seems much more tractable now than I would have expected prior to the advent of LLMs. Old LW lore often invoked evil genies who would follow the letter but not spirit of the utility function. It seems like that problem is ~approaching solved. (2) still looms large, of course - but if we've stumbled into almost solving the evil genie subproblem, that seems worth celebrating.

**Paul Crowley** @ciphergoth@schelling.pt · Dec 21, 2022

**Paul Crowley** @ciphergoth@schelling.pt · Dec 21, 2022

Dec 21, 2022

Paul Crowley @ciphergoth@schelling.pt

@jai The problem was never that the AIs wouldn't understand human values; it was always that we didn't have a good way to point at that understanding.

Trending now

Resources

Developers

What is Mastodon?

schelling.pt

More…