• No se han encontrado resultados

APLICACIÓN DEL CONCEPTO DE TRABAJADOR INDEPENDIENTE

5. DESARROLLO Y APLICACIÓN DEL CONCEPTO DE TRABAJADOR INDEPENDIENTE

5.3 APLICACIÓN DEL CONCEPTO DE TRABAJADOR INDEPENDIENTE

The neural network described above has a set of binary inputs (one for each ray- sensor) that denote the presence (1) or absence (0) of a token (independent of its type). Therefore, in an environment in which there are multiple types of tokens, the only way for an individual to distinguish between them is to pick them up and observe the change in energy. If the environment in which the robot operates is knowna priori, then clearly, the neural network could be designed in order to include relevant information about each token type. However, if the environment is unknown, then the robot must learn to adapt to the different types and values of tokens it may encounter.

1The information cannot be encoded directly to the network withouta priori knowledge of the number

6.3 Designing the Lifetime Adaptation Mechanism 100

The lifetime-learning mechanism proposed here adds qualitative information about the detected token instead of just a binary indicator. It extends a robot’s ability from previously only detecting the presence of a token to being able to distinguish between different types and their relative importance. This is achieved by adding a multiplier to each of the NN’s token detection inputs. Like a set of filters, the appropriate multiplier value is chosen based on the detected token type. The multiplier for each type of token can be a continuous value in the range of −1 and 1 and therefore allow the input to carry more information than just binary.

Every time a previously unseen type of token is encountered (detected by a sensor ray or through consumption), a new value is added to the set. As tokens are usually2 detected before they are consumed, no information regarding the token’s value is known: the robot therefore randomly initialises a value to associate with the type (x) of detected token. Following consumption, the resulting change of energy is detected by the robot and its learning mechanism can modify the corresponding multiplier value (mx).

All multiplier values are adjusted every time a token is consumed according to equation6.1: mx‘=mx+LS ×  LR −CCx total  ×V Vx max−Vmin  (6.1) mx is the current value for the multiplier for typex; Cxis the number of tokens of

type x collected; Ctotal is the total number of all tokens collected;Vx is the value of

the token that has just been consumed and is therefore now known to the robot (being equivalent to the change in energy);Vmax andVmindefine the minimum and maximum

values of all tokens encountered so far.LR is a learning rate that controls the magnitude of the change, andLS is either −1 or +1 and simply inverts the direction of change; this is required to adjust the learning mechanism to the internal value notation of the neural

2As robots only have a discrete number of ray-sensors and not a full field of detection around their

body, only objects crossing a sensor ray can be detected. This can lead to a situation in which a robot drives over a token before any of the ray-sensors detect it.

6.3 Designing the Lifetime Adaptation Mechanism 101

network and is adapted via evolution alongside the genome. The learning mechanism is shown in algorithm4.

1 if tokenxis unknownthen

2 multipliers.add(tokenx);

3 end

4 if tokenxis consumedthen

5 tokenCounterx.update(tokenx); 6 totalTokenCount.update(); 7 tokenValuex.update(δE(t) − δE(t − 1)); 8 totalValueRange.update(); 9 for mxin multipliersdo 10 mx.update(); // eq. 6.1 11 end 12 end

Algorithm 4: Pseudo code of the steps carried out to update all multipliers every time a token is encountered.

Three factors influence the learning mechanism: the initial value assigned to a token Vx, the learning rate LR and the associated sign LS. These factors can be randomly

assigned, fixed to some specific value or can themselves be subject to evolution. Al- lowing the learning sign to co-evolve enables the learning mechanism to self-adapt to the internal value convention of the neural network. Finally, enabling the robot to evolve an appropriate starting value for each type of token based on its experience may speed-up learning in some circumstances. Even though token values change over seasons, inheriting a good starting value may be beneficial and, presumably, dependent on the rate of change of the environment. This algorithm makes use of the Baldwin Effect [2], by intertwining the learning mechanism with the evolutionary process and making the adaptable parameters heritable.

Note that in no case is any Lamarkian evolution used, i.e. although the multiplier starting values are adapted over the course of a lifetime, they arenever written back to the genome and are therefore not inherited.

6.4 Hypotheses 102

6.4 Hypotheses

The following alternative hypotheses inform the experimental design and are tested through experimental investigation.

Hypothesis 7 The effectiveness of different individual adaptation settings is influenced by the parameters of the environment (token count, token value).

Hypothesis 8 The rate of change of a given environment influences the effectiveness of the individual adaptation mechanism.

Hypothesis 9 The nature of the individual adaptation mechanism influences perfor- mance in different environments.

6.5 Experiments

Experiments are carried out using mEDEAr f as described in3.4. Experimental and simulation parameters are given in table6.1. Parameters associated with the learning mechanism are given in table 6.2. The values for LRinitial and LRmax were selected

following limited empirical exploration.

Table 6.1 Simulation and experimental parameters for all experiments in this chapter. Simulation parameters

Arena size 1024 px× 1024 px

Max. robot lifetime 2500 iterations

Token re-spawn time 500 iterations

Sensor range 196 pixel

Max. communication rangermax 128 pixel

Experimental parameters

Number of independent runs 30

Number of robots 100

Max. iterations 1,000,000

6.5 Experiments 103 Table 6.2 Learning parameters with initial values and ranges in which they can change during runtime of the experiment. Method of adaptation (through evolution or lifetime- learning) depends on the experiment.

Parameter Init. Value Value Range

Learning rate,LR 1.02 [LRmin,LRmax]

MinimumLR, LRmin 1 fixed

MaximumLR, LRmax 1.5 fixed

Multiplier of typex, mx random [−1,1]

Inherited multiplier,imx random [−1,1]

Learning sign,LS random [−1,1]