The experimental data on performance showed a clear advantage for net- works with modulatory neurons. Yet, the link between performance and characteristics of the networks was not easy to find due to the large vari- ety of topologies and plasticity rules that evolved from independent runs. Figure6.20 shows an example of a network that solved the double T-maze. The neural topology, number of neurons and plasticity rule may vary con- siderably across evolved networks that performed equally well.
Nonetheless, it was possible to check if the better performance in the double T-maze agents evolved with neuromodulated plasticity was corre- lated with a differential expression of modulatory and standard neurons. The architecture and composition of the network are modified by genetic operators that insert, duplicate and delete neurons. The average number of the two types of neurons was measured in evolving networks for the condi- tion where plasticity was not affected by modulation (Figure 6.22, top left graph) and for the condition where plasticity was affected by modulatory inputs (Figure 6.22, bottom left graph). In both conditions, the number of modulatory neurons was higher than the number of standard neurons.
0 200 400 600 800 1000 0
100 200
Modulatory Effect Enabled
Fitness 1200 1400 1600 1800 2000 0 100 200
Modulatory Effect Disabled
Fitness 1 2 3 4 Number of Neurons
Neuron insertion/duplication active
0 1 2 3 4 5 No neuron insertion/duplication Number of Neurons Fitness Standard Neurons Modulatory Neurons
Figure 6.22: Fitness (continuous line) and number of inner neurons (dashed lines for standard and dotted lines for modulatory) in networks during evo- lution (average values of 50 independent runs).
However, the presence of modulatory neurons when those were not active (top left graph) depended only on insertion, duplication and deletion rates, whereas in the case when they were enabled (bottom left graph) their pres- ence might be linked to a functional role. This fact was suggested by the higher value of the mean fitness.
In a second phase, the evolutionary experiments were run for additional thousand generations, but the probability of inserting and duplicating neu- rons was set to zero, while the probability of deleting neurons was left unchanged. In both conditions all types of neurons slightly decreased in number. However, modulatory neurons completely disappeared in the con- dition where they had no effect on plasticity (Figure 6.22, top right graph)
0 200 400 600 800 1000 1200 1400 −10 0 10 20 30 40 50 60 Generations
Fitness of modulatory networks minus
fitness of plastic networks
single T−maze
double T−maze
double T−maze homing
Figure 6.23: Differential measurements of fitness. Each line represents the difference in fitness between runs with modulatory neurons and runs with- out (each set is represented by the median fitness from 50 independent runs). The evolutionary advantage of modulatory neurons is described by the tendency of values of being positive. The evolution in the single T-maze was stopped at generation 600 as it reached stationary conditions.
while on average two modulatory neurons were observed in the condition where modulation could affect plasticity. This represents a further indi- cation that neuromodulation of synaptic plasticity is responsible for the higher performance of the agents in the double T-maze and that they play a functional role in guiding reward-based learning.
Comparing the box plots in Figures 6.18and 6.19, it appears that mod- ulatory neurons provided a considerable advantage in the double T-maze
with homing, but not so in the single T-maze. This finding led to the hypothesis that modulatory neurons are advantageous when the problems increase in difficulty. To verify this statement, the evolutionary progress of three experiments with increasing complexity were compared: 1) a single T-maze with homing, 2) a double T-maze without homing and 3) a double T-maze with homing. In the first problem, a 2-armed bandit problem, there are two actions to be repeated each trial: an outgoing direction and a re- turn direction. The return direction is always the opposite of the outgoing direction. In the second problem, a 4-armed bandit problem, two outgoing directions (two consecutive turns) must be learnt. In the third problem, two outgoing and two return directions must be learnt. It was therefore assumed that the three problems were ordered by increasing difficulty9. The median fitness values from two sets (one with modulatory neurons and one without modulatory neurons) of 50 independent runs were compared. Figure 6.23
shows the difference between the fitness with modulatory neurons and the fitness without modulatory neurons. A positive line indicates an advantage with modulatory neurons, a negative line indicates a disadvantage. The dif- ferential fitness at the beginning of evolution is negligible, meaning that the initial random networks performed similarly whether they had or had not modulatory neurons. However, while evolution progressed, networks that could receive modulatory neurons by random mutations increased rapidly their performance manifesting a significant gap with networks that could not employ modulatory neurons. It is possible to note that the evolution-
9It is important to note that the finding that neuromodulation gives an evolutionary
advantage in increasingly complex problems derived from the experiments. The compar- ison of fitness progress among experiments with different complexity proposed hereafter was conceived as part of the analysis, and it cannot be considered part of the methodol- ogy.
ary advantage appeared related to the problem complexity: whereas in the single T-maze there is only a minimal difference in fitness, and a slight disadvantage of modulatory neurons initially, the double T-maze and the double T-maze with homing benefit considerably by the presence of mod- ulatory neurons. Ideally, if the advantage manifests itself only in speed of evolution, after a period in the positive area, the lines would approach null values towards the end of evolution, implying that the advantage is evolu- tionary only, and not computational. On the contrary, when lines stabilise at values different from zero, two cases are possible: either the limitations of the evolutionary algorithm did not allow for the successful evolution of one of the two sets, or the networks of one of the two sets have a computa- tional advantage over the others. Figure 6.23 supports the hypothesis that modulated plasticity evolves to benefit networks in increasingly complex problems.
A further test was conducted on the evolved modulatory networks when the evolutionary process was completed. Networks with high fitness that evolved modulatory neurons were tested with modulation disabled. The test revealed that modulatory networks, once deprived of modulatory neu- rons, were still capable of navigation by turning at the required points and maintaining straight navigation along corridors. The low level navigation was preserved and the number of crashes did not increase. However, most networks seemed capable of turning only in one direction (i.e. always right, or always left), therefore failing to perform homing behaviour. None of the networks appeared to be capable of reward-seeking behaviour, curiously evoking anhedonic behaviour (Berridge and Robinson, 1998). Generally, networks that were evolved with modulation and that were downgraded to plastic networks (by disabling modulatory neurons) performed worse than those evolved without modulatory neurons. Hence, it can be assumed that
modulatory neurons are employed to design a different neural dynamics that, according to the experiments, were easier to evolve, and on average empowered solutions with an important advantage.
6.5.5
Conclusion
The model of neuromodulation described here applies a multiplicative effect on synaptic plasticity at target neurons, effectively enabling, disabling or modulating plasticity at specific locations and times in the network. The evolution of network architectures and the comparison with networks un- able to exploit modulatory effects showed the advantages brought in by neuromodulation in environments characterised by distant rewards and un- certainties. The increased complexity of the problems appeared to outline distinctly the advantages of neuromodulated plasticity that was more evi- dent in the most difficult problems. The random insertion of modulatory neurons appeared not to affect significantly the search on the single T- maze where neuromodulation was not necessary. A correspondence between performance and architectural motifs was not observed, however it can be assumed that the unconstrained topology search combined with different evolved plasticity rules allowed for a large variety of well performing struc- tures. In this respect, the search space was explicitly unconstrained in order to assess modulatory advantages independently of specific or hand-designed neural structures. In this condition, the phylogenetic analysis of evolving networks supports the hypothesis that modulated plasticity is employed to increase performance in environments where sparse learning events demand memorisation of selected and timed signals.