The Paperclip Maximizer is a type error.

There are at least three versions in play, depending on what's being talked about.

In version A, we're talking about the Orthogonality Thesis, and the paperclips are actual paperclips*, because the point is that a superintelligent AI might not care about what you care about.

* This also applies to bolts, or Facebook share prices.

Follow

In version B, we're talking about Inner Alignment failures, where the AI is programmed to maximize human happiness, and the "paperclips" are 10-neuron constructs that count as human to the AI and can only feel happiness.

In version C, we're talking about Reward Hacking, and the AI has "maximized" something by overwriting the memory cell where it tracks the count. The "paperclips" are tiny memory cells that the AI can use to increase the number it cares about.

I have seen all three of these seriously discussed by people downstream of the LessWrong memeplex.

Sign in to participate in the conversation
Mastodon

a Schelling point for those who seek one