In version B, we're talking about Inner Alignment failures, where the AI is programmed to maximize human happiness, and the "paperclips" are 10-neuron constructs that count as human to the AI and can only feel happiness.
In version A, we're talking about the Orthogonality Thesis, and the paperclips are actual paperclips*, because the point is that a superintelligent AI might not care about what you care about.
* This also applies to bolts, or Facebook share prices.
@empathy2000 is this just because we use the jargon "tacit knowledge" for that category, or do you think there's more discussion missing?
@flats I think the instrumental convergence argument is still pretty good. It does rely somewhat on the idea that the AI will be trained to optimize a single metric.
When reinforcement learning seemed like the winning technique, this was a big risk. Now that LLMs are the most promising technique, it's less clear. <Minimize next token prediction error> doesn't obviously call for conquering the universe.
@flats right. The question is how many of the fundamental arguments were worked out assuming that the goal was to build a CEV sovereign and never rechecked to see if they still apply now that that goal has been abandoned.
@flats If the AI isn't going to acquire godlike power, how many of the issues devolve into the principal-agent problem?
But no one wants to double check 1000 pages of blog posts to see if the conclusion relies on an unstated assumption.
@flats I think the problem is that a lot of their thinking on AI has a presumed final step <then we give it control over everything and it instantiates heaven on earth> and a lot of the threats hinge on the implicit assumption that you will give the AI control over everything.
So, an AI might conceal its real goals... Is that an issue if it is only going to get enough power to run the factory?
Maybe, maybe not. But we have to check every argument.
@flats it looks like I won't have time to write a real post anytime soon, so I'll point you to this short summary instead:
https://twitter.com/WomanCorn/status/1631696104403107844?s=19
What I find amazing is that none of the glass parts of the lamp broke. I'd expect those to break easiest.
@lispegistus if you wait until the 1919 eclipse, you don't beat the standard timeline.
Is there a way to do it sooner?