So instead of looking into continuity I instead read about infra-Bayesianism, which looks fun (apparently there’s also an infra-Bayesian version of the complete class theorem!)
Basically what it seems so far is looking at what you can strip away from MDPs in the case where some enemy looks at your policy and chooses the worst environment, and this has something to do with non-realisability? Cannot wait to read more