• No se han encontrado resultados

VIII. Structure of the Doctoral Thesis

1.5 Dimensiones de la Responsabilidad Social Corporativa

According to the literature (see Table 5.1), the ensemble learning method is one of the common algorithms and has the best performance. Ensemble learning aims to build machine learning models with better performance by combining several models. In general, researchers [76] have shown that combining several models is more likely to get better prediction than a single model. Many of the recent data science competitions have been won using ensemble method algorithms such as random forest (RF) and XGBoost11 [77], [78].

Bootstrap aggregating (bagging) is an ensemble method that uses a collection of bootstrap data samples to fit multiple models usually from the same algorithm family, such as decision trees [79]. Fitting several models based on different views of the main dataset and then averaging their predictions helps to reduce the instability of the predictions. The bootstrapping method of sampling work involves getting random samples from the original dataset with replacement. Training on different bootstrap samples results in a different learning hypothesis on a certain instance, and by averaging those predictions or opinions, better overall performance can be achieved. This method demonstrates increased effectiveness on noisy data compared to using a singular model because of the random sampling of data [2].

The RF is an ensemble-based classifier, which means that it consists of a collection of sub-models that are used to make a joint decision. RF has several decision tree classifiers. These trees are built using a bootstrapping samples of the full training dataset, which results in potential differences between the trees, as the importance

ranking of features may differ for different trees. The reliance on multiple decision trees to come up with a judgement makes RF classifiers more robust and less prone to overfitting than single decision trees and other non-ensemble methods [80].

Figure 2.8 Random Forest generic mechanism.

The whole dataset is divided into n samples and each sample is used for building a singular DT. Then in the final stage, each model prediction is combined for the

final prediction

RF is one of the well-known examples of a bagging method [79]. Bagging learning methods generally work by having multiple equally weighted base learners, and each learner is trained on a subset of the whole dataset. RF has an additional step to the traditional concept of bagging methods, which is selecting a subset of features instead of using the whole feature list. Predicting new unseen data in the case of RF is conducted by submitting the feature list of the unseen example to all m trees in the forest, getting the prediction results, and then creating a final prediction based on the average of all the trees’ predictions. Compared to a singular decision tree, RF is better at handling noisy data and less prone to overfitting.

Extremely randomised trees [81] is another example of a bagging method. This algorithm has a similar methodology to RF but with some differences in feature set selection and finding the optimal cut-off point. This algorithm differentiates itself from RF as it does not calculate the best feature to be the split node or the split value of the selected features. Therefore, the term ‘extremely random’ refers to the selection of features and the cut-off point.

Figure 2.9 Boosting general mechanism

the whole dataset used in all n iterations of the training of boosting models Boosting is an ensemble learning method that aims to improve the performance of a weak model by giving weight to the misclassified instances [82]. A weak model in the context of boosting learning means a model that has no previous guides about the data, so its performance more likely to be low. Boosting works by repeatedly training this weak model on the same training dataset, but in every iteration, the algorithm adds

extra weight to the examples that the model could not classify correctly. The final classifier is produced by a series of enhancements and adjustments on the first weak learner to make an ensemble model that is likely to give higher performance [83].

Unlike bagging, boosting algorithms do not bootstrap samples from the dataset or any kind of sampling the dataset. Boosting algorithms use the whole dataset for training but training examples are adjusted in every iteration. Every time the model is trained on the dataset, it evaluates itself and increases the weight of the misclassified examples, then passes these back into the model; the number of iterations being specified by the user is based on acceptable performance levels. This focus on misclassified examples makes boosting one of the best ensemble learning methods. Although this technique could show robustness to classify difficult examples, the performance will decrease dramatically when noisy or misclassified examples exist in the dataset, as boosting algorithms will try to weight highly noisy examples to try to fit the model.

The Adaptive boosting algorithm (AdaBoost) is another boosting-based concept. This is an algorithm that starts with a weak learner trained on a dataset and then the output is evaluated, giving more weight to the misclassified examples. The number of iterations of training and weighting misclassified examples is specified by the user. Finding the right values for these hyper-parameters is done by using tuning methods such as MDA and grid search. Compared to RF, this method can show better results depending on the dataset used. Small and noisy datasets are better fitted by RF than AdaBoost, as it is prone to give too much weight to the noisy examples [79].

eXtreme gradient boosting, known as XGBoost [78], is a new implementation of the gradient boosting trees algorithm. Enhancements include producing improved

learning or tree learning algorithms and making it faster and better in terms of scalability. Due to its high performance and ability to work in a distributed environment, it has become popular in many data science competitions.12 The library is open source and well documented,13 and have supported and implemented for multiple programming languages (i.e. R language and Python).

Generally, the main difference between boosting and bagging is that bagging uses different bootstrapped samples to train several models and then applies equally weighted voting. Boosting works by training one model several times on all datasets but adjusts misclassified samples every time. Voting in boosting is weighted based on the performance of the model at every learning iteration. These differences make boosting and bagging quite different in their learning hypothesis and predictions.