@niplav maybe a bit? might function a bit like making a policy be more a quantizer rather than straight up utility optimizer if you say look for optimal policy within some maximum distance from pretrained LLM. not in the same formal way as the other impact measures but could have a similar practical function.
@niplav maybe a bit? might function a bit like making a policy be more a quantizer rather than straight up utility optimizer if you say look for optimal policy within some maximum distance from pretrained LLM. not in the same formal way as the other impact measures but could have a similar practical function.