@niplav One reason might be reasoning at things with different intrinsic time horizons.
In a shooter, a very high decay rate to learn aiming, since it's quick. A discount rate for 5 minutes if matches are expected to last 5 minutes. This is strategy within 1 match. A discount rate for 30 minutes could be a whole set of matches, learning statistical patterns like "people never do the same strategy twice in a row".
I don't think this is common in practice, though. For RL people pick 1 rate.
@niplav One reason might be reasoning at things with different intrinsic time horizons.
In a shooter, a very high decay rate to learn aiming, since it's quick.
A discount rate for 5 minutes if matches are expected to last 5 minutes. This is strategy within 1 match.
A discount rate for 30 minutes could be a whole set of matches, learning statistical patterns like "people never do the same strategy twice in a row".
I don't think this is common in practice, though. For RL people pick 1 rate.