• No se han encontrado resultados

El Madrid de Carlos III a golpe de vista

ESQUILACHE, UN SOÑADOR ATEMPORAL

4.4. Esquilache (Josefina Molina, 1988) 172

4.4.4. El espacio como elemento discursivo

4.4.4.1. El Madrid de Carlos III a golpe de vista

6.2.1 Aims of the Experiment

Inclusion of the preliminary training stage in the training regime of the proposed model is intended to resemble how learning to count progresses in children. Since such an approach is not standard from the point of view of the usual practice of neural network training, it is appropriate to examine if this has any impact on the final performance of the proposed model. At the same time, such an experiment is expected to provide an answer to the first research question posed in the present thesis: how does mastering the count list prior to learning to count within the respective range of collection sizes affect the subsequent process of learning to count?

Addressing this question was the primary aim of this simulation. In addition, the

ability of the model to generalise, that is to count arrangements of items that the model has never encountered during training, was assessed.

6.2.2 Procedure

In this simulation the model was used in the configuration depicted in figure 22, that is it was trained to count using both visual and proprioceptive information, with the latter acting as an input to the network. This configuration has been chosen as this is the primary model set-up used in the later simulations. In order to investigate the impact of the preliminary training stage on the outcome of the second training stage, the training protocol introduced in section 5.3 was modified in the following way. Once the model in the initial configuration was created (figure 12), a copy of it was made, in order to preserve the identical initial weights of the network connections. One of the copies was then subjected to the preliminary training stage followed by the second stage, while the other proceeded directly to the second stage of the training. The first network thus acquired the ability to recite a sufficiently long count list before proceeding with learning to count items, while the second one had to acquire both these skills at the same time. In both cases, during the second stage of the training identical training data sets were used (the training data sets were of course different between trials, as were the initial weights in the network).

In order to compensate for the additional training received by the first network, the second stage of the training of the second network was prolonged from 4000 to 4064 epochs (there are 2 sequences in the training data set in stage 1 and 22 in stage 2, and therefore 700 training epochs in stage 1 correspond, in terms of the number of performed weights updates, to approximately 64 epochs in stage 2). The training parameters in both stages were as reported in section 5.3.

The experiment was repeated 30 times for NH = 15, 20 and 25, yielding the total of 90 trials. After training, the models were evaluated on two types of test data sets. The first data set, identical for all 90 trials, consisted of 50 collections (5 different examples for every number from 1 to 10) in arrangements that have

never been shown to any of the networks during the training. The second data set, specific for every trial (but identical for the two networks within a trial), consisted of 50 arrangements of items chosen from those shown to the networks during training.

The number of examples from a test data set counted correctly served as an index of the performance of the model.

6.2.3 Results

The experimental set-up described above corresponds to a mixed-design 2× 2 × 3 (stage 1 presence or absence × test data set known or unknown × NH = 15, 20, or 25) repeated-measures anova, with NH as the between-subjects factor and the number of test examples counted correctly as the dependent measure. The difference in counting accuracy caused by inclusion versus omission of the preliminary training stage and by the type of the test data set were two planned contrasts. In 3 trials, all with NH = 15, the preliminary training stage did not finish with a successful acquisition of the count list by the model. These trials were therefore discarded, what left 87 trials available for the statistical analysis. Statistically significant effects of the stage 1 inclusion (F = 965.954, p < 0.001, ηp2 = 0.920), of the training data set type (F = 105.278, p < 0.001, ηp2 = 0.556), and that of NH (F = 6.739, p = 0.002, η2p = 0.138) were found. In addition, the interaction between the within-subject factors was significant (F = 10.713, p = 0.002, ηp2 = 0.113). NH did not interact with the within-subject factors. The profile plots of the estimated marginal means for the stage 1 versus data set type interaction and for NH are shown in figure 23.

The effect of NH as well as the interaction between the within-subjects factors were investigated in post-hoc analysis, with assumed level of significance α = 0.01.

Pairwise comparisons between the three levels of NH (with α adjusted for multiple comparisons using the Holm-Bonferroni method) indicated the only statistically significant difference to be NH = 15 against NH = 25 (p = 0.002, the other p values were p = 0.035 for NH = 20 against NH = 25 and p = 0.924 for NH = 15 against NH = 20). Pairwise comparisons for the stage 1 inclusion versus dataset

... Hidden layer

Figure 23: Profile plots for the simulation 2 anova. The plots show the number of examples from the test data set (out of 50) counted correctly by the model after the second stage of the training. (a) illustrates the stage 1 inclusion versus dataset type interaction. (b) shows the effect of the size of the hidden layer in the model.

The error bars indicate 95% confidence intervals. All possible pairwise comparisons in (a) are significant at α = 0.01 (see text). The star indicates the only statistically significant difference (at α = 0.01) in (b).

type interaction, based on a paired-samples t-tests adjusted for multiple comparisons using the Holm-Bonferroni method, indicated all 6 comparisons to be statistically significant at assumed α (all p < 1· 10−5). As evident in figure 23a, the discovered interaction is of the ordinal type, therefore its presence does not invalidate the discussion of the discovered main effects of stage 1 inclusion and of the dataset type.

6.2.4 Discussion

The statistical analysis indicates that the counting accuracy of the model was af-fected by three factors:

ˆ whether the model was tested on known or unknown item arrangements;

ˆ whether the preliminary training stage was included or not;

ˆ the number of hidden units in the network.

As shown in figure 23a, the proposed neural network tended to achieve better scores on the known test data set than on the unknown one, and when the preliminary training stage was included than when it was not. The latter is an especially im-portant result in the context of the considered research question. It shows that, in addition to being justified from the theoretical standpoint by the data from ex-perimental psychology, the introduction of the preliminary training stage brought tangible benefits in terms of the final counting performance of the model. Note that, in addition, a statistically significant interaction between the stage 1 inclusion and the dataset type has been found. The slopes in figure 23a indicate that when the preliminary training stage was included in the model training, the counting per-formance of the model was less affected by whether the model has been tested on the known or on the unknown arrangements of items. In other words, the prelimin-ary training stage not only allowed the model to achieve higher counting accuracy within the given amount of training, but also enabled it to generalise better.

The mechanism explaining the contribution of the preliminary training stage on the model’s counting accuracy is most likely the following. Successful preliminary

training stage equips the model with an ability to produce a correct count list, the length of which is sufficient to count any set that is presented to the model during the second training stage. The task of the model during the second stage is therefore only to learn to ‘modulate’ its output based on the visual and proprioceptive information present at input, rather than to learn the entire task from scratch. Evidently, the state in which the weights of the network are left after the preliminary stage biases the subsequent training in such a way that better counting accuracy can be achieved within the comparable amount of training. Although limited to the scope of the considered largely simplified scenario, the above finding may be interpreted as computational evidence that being equipped with a sufficiently long count list prior to learning to count collections within the respective number range makes the latter task easier. While this does not mean that children’s acquisition of the appropriate portion of the count list before learning to use it to count items is a necessary developmental step in the Piagetian sense, it is possible that this allows the subsequent learning process to be sped up, perhaps to the point where it actually happens within a reasonable time, or even within the lifetime of the individual.

As the conducted statistical analysis indicates, the network’s final counting per-formance was also affected by whether the test data set consisted of the item ar-rangements which were shown to the network during the training, or not. However, the size of this effect was marginal in comparison to that of the preliminary training stage, as seen in figure 23a and through the obtained η2p values. Overall, it can be said that the model generalised quite well from the arrangements on which it has been trained to the novel sets of items, with only a modest drop in the counting accuracy. In accordance with this finding, in the subsequent simulations the test data sets were composed solely of item arrangements that have not been shown to the model during the training.

Finally, the counting accuracy was also affected by NH, although to a rather modest degree (as indicated in figure 23b and by the low ηp2). Importantly, the size of the hidden layer in the network did not interact with the other factors, which

means that the discovered effects were robust across the considered range of NH. Since the size of the hidden layer did not affect, in the considered range of its values, the ability of the proposed neural network to generalise, in subsequent simulations NH was simply fixed to 20.

6.3 Simulation 3 — Contribution of the Counting