Misalignment in cancers and antibiotic resitant bacteria, and selecting against is not enough
@niplav I think I don't know what misalignment means in this context.
@Paradox you're right
Okay
Adversarial selection pressure against something can worsen the badness of the thing, monotonic in the strength of the pressure
This might also apply to misaligned AI systems selected by search
@Paradox this one ^ i endorse far more
@niplav Ok I imagine adversarial selection pressure means that you're just telling the AI to not do X, and the more you punish that, the worse a thing gets, but I don't know why it would do that.
@Paradox you're selecting both against the behavior *and* against your ability to detect/effectively deinforce the behavior
Similar to antibiotics resistant bacteria, where we select against bacteria and against our ability to defeat bacteria
@niplav Ohhh yeah.
Cuz when you use antibiotics too much, it gives them the resources to figure out how to resist it. That's kinda just evolution. If you don't get rid of a thing entirely, it will eventually become harder to do so. So what's the alternative here?
@Paradox good Q! Best option I know is kill on the first try
Otherwise 🤷
I guess people could argue SGD is strong enough to just remove all deception in the first pass when it's still weak