The Q-Function paradox¶
The most rational way to live your life is also the worst way to live it.
This hit me once when talking to a another reinforcement learning researcher. You see, in finite horizon problems, you need a different Q-function for each time step. The optimal action at time t depends not just on your current state, but on how much time remains.
This makes sense. If you’re 25, taking two years off to backpack through Asia might be optimal. If you’re 55, probably not. The value of any given action depends critically on your remaining horizon. Said differently, you cannot live your life with the same policy at any age or point in time.
Most people intuitively understand this. We talk about being in different “life stages.” We adjust our risk tolerance as we age. We make different career moves at 30 than at 50. This is rational behavior; we’re computing age-appropriate Q-functions in a way.
Now here it gets counter-intuitive: while this time-dependent approach is mathematically correct, it might be psychologically toxic.
The finite horizon mindset does something subtle and destructive. It turns every decision into a countdown. It makes you calculate whether you have “enough time left” for new projects, relationships, or skills. It whispers that opportunities have expiration dates. More crucially, it makes you dread new projects, dare on new opportunities, and even sometimes avoiding learning new things, fearing the lack of “time”.
One of my biggest fears in life is what actually happens to a lot of people in their 40s and 50s who fully embrace finite horizon thinking. They become conservative, risk-averse, focused on protecting what they have rather than growing. They loose the “child” in them, this radical learning machine who doesn’t care much about loosing the game as long as she has fun, who finds no joy in easy wins anyway… (anybody who got stuck for hours trying to defeat the final boss of their childhood game will hopefully relate to this).
If you start sentences with “If I were younger…” you are treating your life like a optimization problem with a hard deadline. The alternative is what I call the infinite horizon mindset. Act as if you have unlimited time steps remaining. Value learning for its own sake. Optimize for the process, not just the outcome. Take on projects because they’re interesting, not because they fit your age bracket.
This seems irrational. How can you ignore the obvious constraint of finite time? But wisdom often looks irrational from the outside. The infinite horizon approach does something crucial: it keeps your intrinsic motivation alive. It preserve the sparkling curious child in you. When you’re not constantly discounting future rewards by remaining lifespan, you stay alert. You keep building. You remain open to serendipity. You stay alive.
I know 70-year-olds who are learning to play piano and 30-year-olds who act like their lives are over. The difference isn’t age, they simply have different Q-functions. I use finite horizon thinking for the big structural decisions, having kids, buying a house, changing careers. But for everything else, for how you spend your days, what you’re curious about, how you engage with the world, run the infinite horizon algorithm.
The paradox resolves like this: you need both systems running simultaneously. Plan rationally, but live infinitely. A double Q-function, time-dependent and an infinite horizon one… wait, perhaps I should spin up OpenAI Gym and start experimenting…