CAPITULO 2: La construcción y dinámica de los campos en disputa
2.3 Los magistrados y su libertad en dentro del campo jurídico
As referred in Chapter 1, our research aims at providing data scientists with methods to overcome the difficulties in tuning NNs. As described in Section 2.2, as the NN training is an iterative process, it is important to correctly define its starting point and stopping criterion. Nonetheless, it is also important to define the architecture to be used and avoid unit saturation on the training process. For this, we need to correctly parameterise and initialise a NN.
As referred, parameterisation is a difficult and time-consuming process that requires user expertise. It is expected to be difficult to find a single combination of parameter values that generally leads to high performance. This way, we aim at using MtL to help in the parameterisation process, by taking advantage of experience acquired on previously learned NNs. For this, as explained in Section 2.3, we will need to design a set of informative metafeatures capable of describing the NNs’ behaviour when faced with different combinations of parameter values. MtL will approach the parameter selection problem with the ML tasks referred in Section 2.1.
It was also referred in Section 2.2 that it is very important to choose a good starting point for a NN, i.e., its initial weights. The most common approach for this problem is starting the NNs with a random set of weights. However, training in NNs is an iterative process that consists in iteratively adjusting the connections’ weights to minimise the NNs’ prediction error. If the initial weights are too far from the optimal values, the training process will take too much time. Bearing this in mind, we wish to provide data scientists with a method to initialise the NNs, by transferring weights (relational knowledge transfer) from NNs trained previously for datasets with domains different from the one at hand (heterogeneous TL).
However, we need to make sure that the transfer will not harm the NNs’ performance instead of improving it. We aim at using MtL to predict the transfer’s impact and with this prevent negative transfer. For this, MtL will take advantage of previous transfer experiments to predict if transferring between specific source and target datasets will have positive or negative impact on the NN’s performance. This will depend on source and target data characteristics, and also characteristics related to the transfers itself. This way, we will need to design informative metafeatures capable of describing the source and target data, but also the transfer behaviour. MtL will approach the source network selection problem with the ML tasks referred in Section 2.1.
Chapter 3
Empirical Study of the
Performance of Neural Networks
Machine Learning algorithms typically have parameters that potentially enable their adaptation to new tasks. These added degrees of freedom are also a source of human and computational time consumption since finding a good combination of parameters is rarely a trivial task. Specifically, in Neural Network (NN) learning there are no generally accepted rules for parameter selection given a new learning problem. The commonly accepted solution is to perform a more or less intense and blind search in a promising region of the parameters’ space for each new dataset in hand.
In this chapter we aim at answering RQ1(Chapter 1): How do different parameter values impact the performance of NNs? To answer this question, we start by studying the impact of running neural networks with several different parameter value combi- nations (parameterisations). We study the parameterisations’ average performance, but also the impact on performance of each parameter value separately. However, as an average good parameterisation may not be good for every dataset, we also study the robustness of the results of each parameterisation.
Next, we identify the datasets used for our study and the neural network implemen- tation considered (Section 3.1). Then we define the experimental setup (Section 3.2) considered for the study described in this chapter, followed by the results obtained (Section 3.3). Just before the Summary (Section 3.5), this chapter also includes a “cheat-sheet” (Section 3.4) that can be used to select subsets of parameters that lead to average high performance. With this, the data scientist can significantly reduce the grid search needed.
3.1
Datasets and neural network implementation
We use the same datasets for all the four phases of our research. These are described next, followed by the neural network implementation considered throughout the study.
3.1.1
Datasets
Throughout our research we use a group of benchmark regression datasets composed of numerical variables collected from UCI (Lichman, 2013), shown on Table3.1.
Table 3.1: UCI Datasets used and number of datasets generated from them.
id name nr. nr. nr.
examples attributes datasets
1 Airfoil Self-Noise 1503 5 1
2 * CBMNPP1 11934 15 2
3 Combined Cycle Power Plant 9568 4 1
4 Communities and Crime 1993 101 1
s 5 * Communities and Crime Unnormalized 1901 119 18
6 Concrete Compressive Strength 1030 8 1
7 Computer Hardware 208 9 1
8, 9 Challenger USA Space Shuttle O-Ring 23 2 2
10 Online News Popularity 39644 58 1
11 * Parkinsons Telemonitoring 5875 21 2
12 * Concrete Slump Test 103 9 3
13 Buzz in social media 28179 96 1
14, 15 Wine Quality 1599/4898 11 2
16 Yacht Hydrodynamics 308 6 1
Some of these datasets (marked with *) have more than one dependent variable. In that case, several datasets are created by splitting the original dataset by dependent variable. The result is the group of datasets shown on Table A.1 in Appendix A. With this, we can also evaluate and compare the behaviour of the neural networks for similar and different datasets.