Follow

Does regularization of RL policies act as an impact measure?

@niplav maybe a bit? might function a bit like making a policy be more a quantizer rather than straight up utility optimizer if you say look for optimal policy within some maximum distance from pretrained LLM. not in the same formal way as the other impact measures but could have a similar practical function.

Sign in to participate in the conversation
Mastodon

a Schelling point for those who seek one