In Chapter 2, it was suggested that in addition to the inherent variance in network solutions produced by training case generation and presentation, that varying the design and
specification of constituent networks of an ensemble could also produce accuracy
improvements. This section describes an attempt to create and evaluate systematic variance in the conditioning of data used to train the networks making up the ensemble.
6.4.1 Methods
In order to produce an improved generalisation performance of an individual network, noise injection on the training dataset has previously been conducted. However, an ensemble requires diversity of network design in order to provide sufficient coverage of the solution space in order to provide a useful aggregation of estimations. The magnitude of diversity has not been examined in this application previously.
To investigate this effect in a systemic manner, a tool was produced (code in appendix A20) to create various normal distributions of whole numbers of maximum noise injection percentage around a specified mean value and with a specified standard deviation. Validation of this is shown in Table 40.
Target (Mean, (SD)) Test Distribution Mean Test Distribution Standard Deviation 5 (0) 5 0 5 (1) 4.93 1.01 5 (2) 5.09 1.92 5 (5) 5.08 5.81 5 (10) 5.16 10.11
187
This distribution of network specifications was then used to control the construction of datasets for network training, with each containing a particular architecture, and with the ensemble as a whole having a consistent and measureable distribution of training noise injection parameters.
In these tests, the instrumented socket described in Chapter 4 was used. Network parameters (with the exception of noise injection) were identical to those used in Chapter 5 with the same training end procedure.
Network performance was evaluated using a separately created set of superposition loads generated from a distinct set of measurements. In contrast to earlier work, this consisted of superposition cases only, rather than a mix of superposition and isolated loads. Although this is arguably a more realistic means of assessment, results are not directly comparable to previous work. Error in this study is likely to be around 2% lower than in section 6.2 – this was estimated by supplying an ensemble with a loading file of the type described earlier.
As described above, the noise injection average maximum magnitude was varied in this study in five values – centred on 0%, 5%, 10% 15% and 20%. The standard deviation of the
distribution around this was set to 0 (i.e. identical maximum noise injection), 1, 2, 5 and 10. Due to the nature of noise injection, it is applied either side of the original value. 100 networks were trained for each ensemble, for a total of 2500.
6.4.2 Results
Results of RMS error for constituent networks are shown in Table 41, along with the standard deviation of each ensemble distribution. Results for each ensemble estimate, along with the percentile that the ensemble estimate would occupy in the group of constituent networks are included in Table 42. The percentile value is an evaluation of the relative effectiveness of using the ensemble. A low percentile value indicates that very few networks could be expected to outperform the ensemble. A higher percentile value means that the ensemble error is closer to the performance of an averagely performing single network. For statistical comparisons between ensembles and constituents, a Bonferroni correction was included to compensate for
188
the number of tests (p<0.002, Table 43). The standard deviation of ensemble solutions was set to 0.16 using the estimate in section 6.2.
Variation (%) Mean (%) 0 1 2 5 10 0 5.45 (0.91) 5.32 (0.98) 5.17 (0.81) 5.09 (0.67) 5.13 (0.60) 5 5.04 (0.61) 5.10 (0.58) 5.02 (0.58) 5.22 (0.70) 5.00 (0.58) 10 5.05 (0.53) 5.11 (0.66) 4.94 (0.67) 5.06 (0.54) 4.94 (0.51) 15 4.93 (0.44) 4.87 (0.46) 4.78 (0.44) 4.95 (0.53) 4.87 (0.59) 20 4.71 (0.40) 4.81 (0.36) 4.76 (0.37) 4.72 (0.46) 4.75 (0.60)
Table 41 – RMS % error (SD) of the mean constituent network of each created ensemble
Variation (%) Mean (%) 0 1 2 5 10 0 3.91 (5.9) 3.86 (4.7) 3.95 (3.8) 4.12 (2.8) 4.32 (6.5) 5 4.21 (7.1) 4.33 (6.7) 4.47 (15.0) 4.30 (5.1) 4.26 (9.0) 10 4.36 (6.0) 4.47 (12.1) 4.26 (7.3) 4.42 (11.0) 4.21 (5.9) 15 4.41 (10.2) 4.37 (12.7) 4.27 (9.5) 4.42 (18.0) 4.14 (6.3) 20 4.28 (15.2) 4.38 (10.6) 4.31 (6.1) 4.23 (11.5) 4.19 (15.2)
Table 42 – RMS % error (percentile) of constructed ensembles. The second value indicates the percentile this ensemble would form within the group of constituents. In the absence of extensive repeatability measurement of ensembles, this figure represents the relative effectiveness of using an ensemble over an averagely performing network. Variation Mean 0 1 2 5 10 0 <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 5 <0.0001* <0.0001* 0.0036 0.0001* <0.0001* 10 <0.0001* 0.0029 0.0019* 0.0003* <0.0001* 15 0.0002* 0.0009* 0.0004* 0.0026 0.0002* 20 0.0011* 0.0004* 0.0002* 0.0012* 0.0036
Table 43 - Calculated p values from paired t-test comparisons of each ensemble to the group of constituent networks. * indicates significance following correction
189
An ANOVA test was used to assess for any significant differences between different ensemble constructions: a post-hoc Tukey test was employed to identify the significant comparisons. The significance boundary was set to p<0.0000833 due to the large number of comparisons made.
190
Group 1 (Mean %/Variation %) Comparison (Mean %/Variation %)
(0/0) (0/10) (0/0) (5/1) (0/0) (5/2) (0/0) (10/0) (0/0) (10/1) (0/0) (10/5) (0/0) (15/0) (0/0) (15/1) (0/0) (15/5) (0/0) (20/1) (0/0) (20/2) (0/1) (0/10) (0/1) (5/1) (0/1) (5/2) (0/1) (5/5) (0/1) (5/10) (0/1) (10/0) (0/1) (10/1) (0/1) (10/2) (0/1) (10/5) (0/1) (15/0) (0/1) (15/1) (0/1) (15/2) (0/1) (15/5) (0/1) (20/0) (0/1) (20/1) (0/1) (20/2) (0/2) (5/2) (0/2) (10/0) (0/2) (10/1) (0/2) (10/5) (0/2) (15/0) (0/2) (15/1) (0/2) (15/5) (0/2) (20/1)
Table 44 - Estimation of significant differences between constructed ensembles following an ANOVA test. Only significant differences following a multiple comparison correction are shown
In Table 44, significant differences between ensembles estimates were identified in several comparisons – in particular the very low noise configurations performed significantly better
191
than many of the higher noise set-ups. In all cases, the left configuration had lower error than those on the right. The greatest number of significant improvements was demonstrated by the ensemble with the lowest overall error: that with a standard deviation of 1 about a mean noise addition of 0%. None of the very low noise configurations were significantly better than each other.
6.4.3 Discussion
The use of variable specification training data in the production of neural networks has a significant impact on the accuracy of the ensemble. In addition to significant improvements over the mean constituent networks, ensembles with low noise and low variation
demonstrated significantly better accuracy over those with higher mean maximum noise and larger variation.
This is thought to be because although networks that are trained with a high degree of noise contain excellent generalisation performance, and, on average, perform well on test data, they are also less able to precisely represent the underlying transfer function.
By removing the ‘isolated’ loads from the testing file, overall RMS error decreased, both for ensembles and for mean constituent networks. This would confirm the notion that this form of applied load is particularly poorly represented by both networks and ensembles without specific measures to counteract this.
By combining results into an ensemble, the requirements for generalisation on an individual network are reduced as the generalisation ability is adequately represented by the inter- network variation on the approximation of the transfer function. Noise injection reduces in utility as there seems to be sufficient variance in performance to produce an accurate estimate.