Subsequently, when using FF-ANNs or CNNs, and RNNs for regression problems, there are actually two types of prediction that one may want to obtain in correspon- dence of a given input. First, an estimate of the underlying non-linear function of interest. Second, an estimate of the target value itself. To these estimates it is crucial to associate their corresponding measures of confidence. However, there are several issues with the modelling of ANN for a problem which propagates additional uncer- tainties into the predicted quantity of interest from the network. These issues are further discussed in the following sections:
3.2.3.1 Uncertainty from Sampling Variability in Training Data Set
A portion of the total uncertainty in prediction values is attributable to the inherent uncertainty in the input data. From a probabilistic point of view, the data set used for training the network is only one of an infinite number of possible data sets which may be drawn within the given input volume and from the underlying statistical error distribution. In other words, this variability in the training data set is due to the variability in the sampling of the input vectors and in the random fluctuation of the corresponding target output.
3.2.3.2 Uncertainty from ANN Weight Parameters
Additionally, another source of uncertainty affecting the predicted value from the ANN arises as a result of the random initialization of the weight parameters of the ANN. In fact, when an ANN with a unique architecture is trained repeatedly with the same training data, different performing ANNs are being constructed. Consequently, this phenomenon gives rise to a model selection problem.
3.2.3.3 Uncertainty from the Model Structure
Furthermore, using an ANN for regression purpose involves finding an appropriate model structure f (x) that describes the given example data. Generally, the relation- ship f (x) is not known, however, it is often complex and non-linear, thus the network used to describe the relationship will have hundreds or even thousands of weight pa- rameters. Ideally, achieving good performance from an ANN would involve selecting an ANN that has optimal complexity, where optimality is defined as the smallest network structure that adequately captures the underlying relationship. Contrarily, determining the optimal complexity is one of the most difficult tasks in designing an
ANN, as there exists no systematic method to ensure the optimal network will be chosen. The flexibility in ANN complexity primarily lies in selecting the appropriate number of hidden layer and neurons within these layers, which determine the number of weights in the model. Thus, a balance is required between having too few hidden nodes such that there are insufficient degrees of freedom to adequately capture the underlying relationship, and having too many hidden nodes such that the model fits to noise in the individual data points rather than the general trend underlying the data as a whole. The latter case is referred to as over-fitting, which is often difficult to detect but can significantly impair the performance of an ANN. In order to prevent over-fitting, cross validation (i.e. k − f old) during training is often used. However, apart from being more susceptible to over-fitting, a large ANN with many hidden nodes are inefficient to calibrate, the parameters and resulting predictions have a higher degree of associated uncertainty and it is more difficult to extract information about the modelled function from the parameters. Therefore, selection of the mini- mum number of necessary hidden nodes can be crucial to the performance of an ANN and its value as a prediction tool. Often, the commonly used method for selecting the number of hidden layer nodes is by trial and error approach, where a number of networks are trained, while the number of hidden nodes is systematically increased or decreased until the network with the best generalisability is found. Thereafter, the performance of the ANN is estimated by evaluating its out-of-sample performance, based on an independent test set using some goodness of fit measure, such as the root mean squared error (RMSE) or the coefficient of determination (R2). However, this may not be practical if there are only limited available data, since the test data cannot be used for training. Furthermore, if the test data are not a representative subset, the evaluation may be biased.
3.3
Chapter Summary
ANN is a popular surrogate modelling technique which approximate the computa- tional model by an inexpensive-to-evaluate function. The reason for selecting the ANN over other conventional surrogates is due to the fact that ANNs performs better when the input dimensional space is large. Although a classical multi-layer FF-ANN is good at approximating a complex non-linear function, it does not perform well for time series data. For this reason, dynamic ANNs that handle time series data adequately, such as CNN and RNN have evolved. Generally speaking, additional uncertainties are added to the prediction made by ANN as a result of variability in
the training data set, random initialization of the network weights, and the type of model structure chosen. The question then naturally arises, whether it is possible to quantify the uncertainties introduces in the ANN to ensure a robust reliable predic- tion. Hence, different approaches to quantify ANN uncertainties will be exploited in subsequent chapters.
Chapter 4
Robust Surrogate Models -
Variability in Training Data
Computational models are commonly adopted in engineering practices due to their ability to replace costly and infeasible practical experiments. However, these models, suffer from high computational costs, thus, posing a challenge when performing sim- ulation based reliability and sensitivity analyses. This is due to the large number of samples required for the robust estimation of the failure probability and its sensitiv- ity indices. Additionally, in the reliability analysis of highly reliable systems such as those used in Nuclear engineering practices, the failure regions usually occupy a small region in the input domain, requiring a high number of model calls. Hence, surrogate models are built based on few training samples from the expensive models to reduce these computational cost. However, due to the variability in sampling training data from the input domain of the expensive models, important regions within the sample space can be missed. Thus, leading to an over/under estimation in the quantity of in- terest to be estimated by the surrogate model. Therefore, in this chapter, the bootstrap technique is adopted to deal with this type of problem. Furthermore, a novel stopping criterion is proposed for selecting the number of bootstrap models. To demonstrate the applicability and accuracy of this technique, it is adopted to compute the reliability and sensitivity indices of two analytical functions.
4.1
Background to the Bootstrap Technique
The bootstrap technique [59] is a distribution free inference method which requires no prior knowledge about the statistical distribution of the underlying population of interest. In particular, the basic idea is to generate a sample from the observed data by sampling with replacement from the original data set. Thereafter, an ensemble
of models is trained with the bootstrap samples. The quantity of interest focused at can then be estimated from this ensemble of bootstrap models. In this chapter, the quantity of interest focused at is the failure probability pF and sensitivity indices Si
and Ti.