TITLE
: Introducing Constructive Contextual Reinforcement Learning [CCRL]References:
KAELBLING, Leslie Pack/ LITMAN, Michael L./ MOORE, Andrew W. [1996] Reinforcement Learning: A Survey
http://www.cs.brown.edu/people/lpk/rl-survey.html
CONTENT:
3.1 The Non-Behavioral Learning Concept
3.2 External Reward
5.1 Behavioral Learning Concepts
5.2 Learning with Neural Nets based on the INM-Neuron
5.3 Cognitive Learning Concepts
LTD-R centering its focus on selflearning agents -human as well as transhuman agents- in the domain of knowledge management.
Learning is seen here primarily from a behavioral point of view, but in parallel also from a physiological point of view with a phenomenological perspective as a possible third dimension.
To realize this task one has succesively to set up a formal model of the environment in which learning is assumed to happen and localized in agents interacting with this environment. Besides this it has to be eleborated a formal model of the internal processes of these acting agents in correlation with the observable interactions and environmental states.
The general outline of such a formal theory together with methods of measurements and computational models is described in another paper (see: "LTD-R I+II Outline of Research").
To specify a bit more concretely with which kind of learning paradigm we are working we will describe a first paradigm which is induced by a discussion of the Reinforcement Learning Paradigm, for which KAELBLING et al. (1996) are giving an exciting overview.
The general outline of the reinforcement paradigm can be represented by the following diagram:
Especially two problems with the reinforcement paradigm shall be mentioned here. The one is related to the methodological problem of using a learning concept, which is not strictly 'behavioral'. And the other one is related to the assumption that the reward is coming from the environment.
The non-behavioral character of the central learning concept within reinforcement learning is revealed by the following facts: within reinforcement learning is the term 'learning' bound to the goal of maximizing the amount of reward 'in the long run'. This goal of 'maximizing reward' is connected to different procedures how to obtain this maximum. These goal-obtaining procedures are a mixture of exploring alternatives, evaluation of alternatives, selecting a next step, and this repeated several times for a certain period of time. Procedures which are leading in this way to the optimal value of reward are called 'learning procedures'.
From a methodological point of view induces a concept like the 'optimal value of reward' with regard to a certain individual system some problems.
The 'maximum reward' is not a behavioral concept! It is related to some internal states of an individual system. One consequence of this is that the same environment can be seen quite differently depending from the individual settings of learning. Although this 'individualization' may be useful for certain theoretical investigations it is not useful with regard to the challenge to define a 'learning task' without relying on special conditions of the learner. What should count here is the fact that a certain learner L is able to solve a certain task within a finite periode of time without beeing forced to include the individual inner states of a learner L.
This is one reason why in LTD the term 'learning' is primarily not bound to the concept of reward but to the concept of a 'task' and to the 'solution of a task' within a finite period of time.
But although if one introduces explicitly a behavioral concept of learning one has to clarify what the methodological status of the ‚internal states‘ of a learner can be. As long as one deals only with formal systems or algorithms as such this poses no problem, but at that moment where one is applying these formal concepts to empirical entitities the conditions are changing.
In the realm of empirical systems there are three main possibilities to deal with ‚internal states‘ of a system: (i) from a behavioral (S-R) point of view you are ‚guessing‘ internal states by introducing ‚theoretical terms‘ in your theory; (ii) from y physiological or mechanical (N) point of view you are looking into the system and (iii) from some ‚conscious‘ or ‚inward‘ point of view (P) you have some 'direct' experience of phenomena which you can try to articulate with the means of a language (additionally one has to take into account the implicit restrictions of 'inward views' of neural systems).
We will prefer within LTD case (ii). This implies that one has to correlate the behavioral learning concept explicitly with the mechanical/ physiological concept. In the ‚main-stream‘ reinforcement paradigm there is no discussion about this topic.
The assumption that reward is coming from the environment to a learning system seems to us highly implausible as far as one is dealing with biological learning systems. Thus within LTD we will locate the source for reward in the system itself!
The main idea behind CCRL is the assumption that learning presupposes at least the following components:
These basic assumptions imply some additional assumptions:
The assumption that reinforcement is not a 'given' part of the environment has some resemblance to socalled constructive epistemology. In constructive epistemology it is assumed that the view of the world is an internal construction of the system which builts this internal view on the states which are internally usable.
this sense one can call this approach not only constructive but also contextual; the learner cannot be defined without an explicit account of certain aspects of the environment (the 'task') which work as a kind of 'index' to classify the learner.
It is an interesting question to which extend reward can be 'connected' to states which are only indirectly' linked to the 'original' reward states.
Some more concrete details to the above assumptions.
From these asumptions we are getting the following simplyfied structural descriptions (leaving out details of space, time, aggregations etc.):
As said before does LTD follow a twofold strategy: first, establishing a behavioral learning concept without any relationship to the inner states of the learner, and then establishing a cognitive (mechanistic/ physiological) learning concept which will explicitly be related to the behavioral concept.
In a first step we will characterize the concepts 'task' and 'solution of a task' within the behavioral paradigm.
The main idea is as follows: a 'task' will be characterized as a realizable sequence of configurations of environmental states with a 'key sequence', an optional 'transfer sequence', and a 'solution sequence'. The 'transitions' between the parts of any of these sequences can either be triggered by ‚actions' of the learner (directly or indirectly by 'instrumental actions') or by general laws/ rules governing the behavior of environmental states.
As long as it is not possible to characterize tasks of this kind independently of any inner states of a learner the task is not in existence from the point of view of a learning theory.
What has to be though of as an 'environmental state' depends from the kind of environment one wants to work with.
A task will be called a serious task if the potential behavior of a learner is sufficiently 'powerful' to produce all those actions which are necessary to transform a key sequence after finitely many timeintervalls into a 'solution sequence'.
Because every serious task can only be a highly idealized description of a small fragment of a ‚real‘ environment it is clear, that a ‚real‘ learner in a ‚real‘ environment can use the learned serious tasks only as a starting point for a more advanced generalization about the presupposed ‚bigger‘ environment.
Different from the description of the learning task as such one can also describe the behavior of a learner as a sequence of interactions with the environment and the task, i.e. we will have a learning sequence for learner L1 consisting of learning units which have the format:
<Learner, timeperiod, environmental states, position of learner, actual task partition, action(s) of learner, new environmental states, new position of learner, new actual task partition>.
Because a learning sequence can contain ‚implicit‘ loops one can the learning space also represent as a directed cyclic graph.
5.2 Learning with Neural Nets based on the INM Neuron
The main idea of a learner endowed with a neural net is shown in the following diagram:
In a simple case there is a mapping between the states of a learner and dedicated neural nets which are interconnected.
For every neuron in a net we assume that the following information is available:
<identifier, value dimension, general position and orientation, dimension, {membran positions}, {synapse positions with orientations}>
The idea behind this is that a neural net as part of a body in a real environment has to cope with space and different types of stimuli. This presupposes that the body of the learner has a defined shape, an actual position and an actual orientation. Relativ to the body’s geometry it is then possible to determine the position of a neuron within a ‚real‘ environment. Besides this allows the information about value dimension us to determine which kind of stimuli of the environment can at some time intervall trigger a certain sensory neuron.
We can distinguish two versions of cognitive learning concepts: ‚model centered‘ and ‚correlational‘.
The correlational version of the cognitive learning concept ‚only‘ correlates task specific activities with those neural activities, which are specific for these activities compared to those neural activities which are happening ‚in any case‘.
The model centered version assumes that it is possible to identify within the activities of a neural net such activities which can be ‚understood‘ as representing a ‚model‘ of the environment. Insofar this is possible one can compare this model with the environment ‚as such‘. The ‚content‘ of learning would then be measured with regard to the ‚richness‘ and ‚adequacy‘ of the model compared to the ‚real‘ environment.
In ‚the long run‘ is LTD-learning theory intended to work with such a cognitive learning concept.