Follow

Misalignment in cancers and antibiotic resitant bacteria, and selecting against is not enough

I guess people could argue SGD is strong enough to just remove all deception in the first pass when it's still weak

@niplav I think I don't know what misalignment means in this context.

@Paradox you're right

Okay

Adversarial selection pressure against something can worsen the badness of the thing, monotonic in the strength of the pressure

This might also apply to misaligned AI systems selected by search

@niplav Ok I imagine adversarial selection pressure means that you're just telling the AI to not do X, and the more you punish that, the worse a thing gets, but I don't know why it would do that.

@Paradox you're selecting both against the behavior *and* against your ability to detect/effectively deinforce the behavior

Similar to antibiotics resistant bacteria, where we select against bacteria and against our ability to defeat bacteria

@niplav Ohhh yeah.
Cuz when you use antibiotics too much, it gives them the resources to figure out how to resist it. That's kinda just evolution. If you don't get rid of a thing entirely, it will eventually become harder to do so. So what's the alternative here?

@Paradox good Q! Best option I know is kill on the first try

Otherwise 🤷

Sign in to participate in the conversation
Mastodon

a Schelling point for those who seek one