Online performance of reinforcement learning with internal reward functions
We consider reinforcement learning under
the paradigm of online learning where the objective is good performance during
the whole learning process. This is in contrast to the typical analysis of
reinforcement learning where one is interested in learning a finally
near-optimal strategy. We will conduct a mathematically rigorous analysis of
reinforcement learning under this alternate paradigm and expect as a result
novel and efficient learning algorithms.
We believe that for intelligent interfaces the proposed online paradigm provides significant benefits as such an interface would deliver reasonable performance even early in the training process.
Starting point for our analysis will be the method of upper confidence bounds which has already been very effective for simplified versions of reinforcement learning. To carry the analysis to realistic problems with large or continuous state spaces we will estimate the utility of states by value function approximation through kernel regression. Kernel regression is a well founded function approximation method related to support vector machines and holds significant promise for reinforcement learning.
Finally we are interested in methods for reinforcement learning where no or only little external reinforcement is provided for the learning agent. Since useful external rewards are often hard to come by, we will investigate the creation of internal reward functions which drive the consolidation and the extension of learned knowledge, mimicking cognitive behaviour.