• No se han encontrado resultados

ACTUACIÓN DOCENTE

The initial performance evaluation technique was performed on the real collected SCD dataset, which includes 1896 observations. The empirical study is carried out using models in association with random forest, decision trees. In order to find the classification performance, each classifier calculated using the evaluation metrics. The training set and testing set is randomly selected with iteration with each run.

The results from our experiments are listed in Tables 6.1, showing outcomes for training of the classifiers. The proposed study also provides further performance visualisations with ROC plots in Figures 6.1, and the use of AUC plots as illustrated in Figure 6.2. The AUC bar graphs provide a visual comparison of the area under the ROC curve across the models trained. Ultimately, ran the data down all the trees and proximity matrix fills in. then, divided the datasets according to the total number of trees that is used in our study. In our experiments, used RFC with 50 trees, 100 trees, 200 trees, 400 trees, and 500 trees to evaluate the performance evaluation metrics and accuracy. This study built the random forest first, ran the SCD datasets through the selected number of trees, and eventually recalculated the proximities values. During the training process to build the model, it is found that, RFC with 50 trees performed the best accuracy and AUC with 0.98156 and 0.99789, respectively. The proposed model discovered after running the simulation, the sensitivity with RFC 100 trees outperformed all the other approaches with 0.97856.

109 | P a g e

Table 6-1: Random Forest performance with average of 9 classes (Train)

Model Sensitivity Specificity Precision F1 J Accuracy AUC RFC50/1 0.97844 0.98244 0.86944 0.91844 0.96089 0.98156 0.99789 RFC100/2 0.97856 0.98144 0.86522 0.91711 0.96022 0.98111 0.99656 RFC200/3 0.971 0.96956 0.786 0.86544 0.94067 0.96933 0.99533 RFC400/4 0.97011 0.96433 0.77156 0.85544 0.93444 0.96511 0.99378 RFC500/5 0.954778 0.942778 0.677444 0.786889 0.897667 0.944667 0.985333

Figure 6-1: ROC curve (Train) For random forest classifier per number of trees

The random forest combines the simplicity of decision trees with flexibility resulting in a vast improvement in accuracy. As mentioned in chapter 4 in association with bootstrapped, create a new dataset that is considered the same size as the original. The important process with bootstrapped is to allow for selecting the important samples more than once. Once had created the bootstrapped datasets, created a random forest based on many decision trees but only using a random subset of variables or columns at each step. At each step, considered 13 attributes (13 columns) with 9 different classes belonging to the amount of medication. Considering the subset of variables at each step, created a new bootstrapped dataset and built a number of trees. Ideally, this process occurred hundreds of times with iteration at each step. After running the data down all of the trees in the random forest, calculated which option received more votes. The bagging process uses a number of bootstrap samples from the original datasets that are randomly retrieved to create another dataset. Bagging is considered such a useful and effective

110 | P a g e technique in random forest where small alterations in the training or testing phase can affect the accuracy and performance of the model. Since the label with the most votes win, it is assigned through the out-of-bag samples. In this case, the out-of-bag samples with 9 classes in our SCD dataset are accurately labelled by the RFC. Ultimately, can estimate how accurate the RFC is by the proportion of the out-of-Bag samples that were correctly classified by the random forest model. In contrast, the proportion of Out-of-Bag samples that were incorrectly classified is called the out-of-bag error.

Figure 6-2: AUC Histigram (Train) for random forest classifier per number of trees The RFC/50 and RFC/100 are found to perform almost similarly to one another, with both ranking better outcomes for the training set. The AUC values for both models is average with 9 classes 0.99789 and 0.99656, while obtaining 0.98156 and 0.98111 in regard to the accuracy, respectively. Consistent with the results obtained from the RFC/400 and RFC/500, it was found that the outcomes for Class 9 show the largest differential between the training with 1. On further examination of the results from the RFC/200, RFC/400, and RFC/500, it was found that despite the appearance of reasonable AUC values during training, the model had converted to a particularly narrow output range, suggesting that the training process is able to achieve clear correspondence with the classification targets, arriving instead at marginal responses. Further confirmation is reflected in the sensitivities and specificities obtained for these models, with values seen to fluctuate between opposite extremes.

C la ss if ie r ac cu ra cy

111 | P a g e

Figure 6-3: ROC curve (Testing) for random forest classifier per number of trees This study investigated the performance of random forest models with different numbers of trees and compared with each other using the classification techniques using the oversampling SCD datasets. As mentioned earlier, the experiments were carried out using the original datasets with 14 variables and 9 classes (multi-class problems). The testing sets outcomes for the SCD datasets are illustrated in Table 6.2. The RFC100 obtained the best AUC with 0.91689; RFC 200 received the best accuracy. While, RFC with 500 trees acquired the lowest outcomes across all the AUC performance evaluation method with average of 9 classes 0.90333. Compared with other single classifiers, RFC yields high accuracy and AUC outcomes rates marked in bold. In terms of the sensitivity and specificity with average of 9 classes, RFC400 yields best results 0.86044 and 0.86167, respectively with high favour in classification performance that other approaches. Figure 6.3 and 6.4 illustrates the ROC curve and AUC (Testing), respectively, for random forest approaches per number of trees. The proposed model tested the ROC based on the true positive rates against the false positive rates. In the ROC graph, RFC50 performed best during the training and testing process.

112 | P a g e

Table 6-2: Random forest performance with average of 9 classes (Test)

Model Sensitivity Specificity Precision F1 J Accuracy AUC RFC50/1 0.830444 0.828889 0.373667 0.504333 0.659222 0.829556 0.888111 RFC100/2 0.817778 0.837111 0.372667 0.505778 0.655111 0.837889 0.884889 RFC200/3 0.813889 0.836444 0.372667 0.505 0.650444 0.836222 0.878111 RFC400/4 0.86044 0.85111 0.42278 0.54978 0.71144 0.84967 0.91644 RFC500/5 0.847 0.840222 0.404 0.529889 0.687111 0.839222 0.903333

Further experiments show that the chosen dataset exhibits significant non-linear relationships, presenting a challenge for RFC test models. The RFC classifiers outperformed other single classifiers, demonstrating capability both for fitting the training data and in generalising to unseen examples. Subsequently, a single operating point was selected to illustrate a final classification decision; it was found that the performance at the chosen rejection threshold varied between the training and testing sets for Classes 1, 5, 9, as reflected earlier in the AUC values. Classes 2 and 7 were found to show reasonably consistent performance representation between the train and test sets for this model. It is possible that the reasonable performance obtained for the RFC architecture with various trees included, in contrast with the poor performance of the other machine learning algorithms types, such as ROM and LNN, could point to a detrimental effect on the outputs in the classification setting. In order to obtain better classification accuracy and performance used tree bagger based on 50 trees, 100 trees, 200 trees, 400 trees, and 500 trees. The run iteration repeated 30 times. In order to evaluate the random forest, it is necessary to check the total number of features, which in our study is 13 out of 14 features that most doctors concentrate on when classifying the amount of medication.

113 | P a g e

Figure 6-4: AUC Histogram (Train) for random forest classifier per number of trees Tree bagger frequently produces in-bag examples through oversamples target values (classes) with high classification costs and under-sampling target values with low classification costs. Therefore, out-of-bag technique examples have fewer observations from target values with high misclassification costs and more target values with low misclassification costs. In order to train a classification ensemble not using large datasets and skewed cost matrix, the total number of out-of-bag method observations per class is considerably low. Consequently, the estimate error occurs through the out-of-bag technique having large variances that are difficult to be interpreted.