Skip to content

Life

The Q-Function paradox

The most rational way to live your life is also the worst way to live it.

This hit me once when talking to a another reinforcement learning researcher. You see, in finite horizon problems, you need a different Q-function for each time step. The optimal action at time t depends not just on your current state, but on how much time remains.

This makes sense. If you’re 25, taking two years off to backpack through Asia might be optimal. If you’re 55, probably not. The value of any given action depends critically on your remaining horizon. Said differently, you cannot live your life with the same policy at any age or point in time.

Most people intuitively understand this. We talk about being in different “life stages.” We adjust our risk tolerance as we age. We make different career moves at 30 than at 50. This is rational behavior; we’re computing age-appropriate Q-functions in a way.

Now here it gets counter-intuitive: while this time-dependent approach is mathematically correct, it might be psychologically toxic.