Online performance of reinforcement learning with internal reward functions
We consider reinforcement learning under
the paradigm of online learning where the objective is good performance during
the whole learning process. This is in contrast to the typical analysis of
reinforcement learning where one is interested in learning a finally
near-optimal strategy. We will conduct a mathematically rigorous analysis of
reinforcement learning under this alternate paradigm and expect as a result
novel and efficient learning algorithms.
We believe that for intelligent interfaces the proposed online paradigm provides
significant benefits as such an interface would deliver reasonable performance
even early in the training process.
Starting point for our analysis will be the method of upper confidence bounds
which has already been very effective for simplified versions of reinforcement
learning. To carry the analysis to realistic problems with large or continuous
state spaces we will estimate the utility of states by value function
approximation through kernel regression. Kernel regression is a well founded
function approximation method related to support vector machines and holds
significant promise for reinforcement learning.
Finally we are interested in methods for reinforcement learning where no or only
little external reinforcement is provided for the learning agent. Since useful
external rewards are often hard to come by, we will investigate the creation of
internal reward functions which drive the consolidation and the extension
of learned knowledge, mimicking cognitive behaviour.