**mesaoptimizer** @mesaoptimizer@schelling.pt · Aug 13, 2024

**mesaoptimizer** @mesaoptimizer@schelling.pt · Aug 13, 2024

mesaoptimizer @mesaoptimizer@schelling.pt

Aug 13, 2024

mesaoptimizer @mesaoptimizer@schelling.pt

I'm reading Sutton and Barto's RL textbook, and I notice that the formalism of "reward signal" squishes together two inchoate concepts that are valuable to track separately -- 'sensory feedback' from the 'environment', and the interpretation of the sensory feedback in terms of implications for your wantingness.

**mesaoptimizer** @mesaoptimizer@schelling.pt · 2024-08-13T12:06:33Z

mesaoptimizer @mesaoptimizer@schelling.pt

I use the word "wantingness" deliberately here -- there are many ways you can want things. You can want to achieve a goal (one-time), you can want to maximize the number of paperclips in the world (a continual task that you can only have better or worse outcomes for), or you can want to stop wanting things (an example of a particularly difficult-to-formalize instance of wantingness).

August 13, 2024 at 12:06 PM · · mastodon.el · · ·

Trending now

Resources

Developers

What is Mastodon?

schelling.pt

More…