@niplav I think I don't know what misalignment means in this context.
@Paradox you're right
Okay
Adversarial selection pressure against something can worsen the badness of the thing, monotonic in the strength of the pressure
This might also apply to misaligned AI systems selected by search
@niplav Ok I imagine adversarial selection pressure means that you're just telling the AI to not do X, and the more you punish that, the worse a thing gets, but I don't know why it would do that.
@Paradox you're selecting both against the behavior *and* against your ability to detect/effectively deinforce the behavior
Similar to antibiotics resistant bacteria, where we select against bacteria and against our ability to defeat bacteria