• No se han encontrado resultados

NADPH OXIDASE

4. ENSAYOS EXPERIMENTALES DE TRATAMIENTO PARA SXF

Experiments in the last section show that the random-undersampling method is able to make significant improvements to the quality of models used for predicting the disengagement behaviours in most cases. As another successful predictive task in which event-frequency- based data representation behaved competitively, first-purchasing tasks have been shown to be highly biased in all three of the games we worked on. In this section, the performance of classifiers trained with the raw dataset (highly biased) is compared with the performance of those processed by undersampling to determine whether random undersampling always helps in this highly biased task. Because that the bias in these experiments are serious as well, improvements are also expected from the random undersampling balancing method. The experiments covered are depicted in Figure 7.3.

7.3.1 Experiment Information

While most of the experimental settings used in this section were kept the same as those discussed in Section 5.2.2, as can be seen in the Figure 7.3, a change is made at the stage in which the balancing approach is selected. This is done to determine whether the random- undersampling method can help with the first-purchasing prediction problem.

7.3.2 Experiment Details and Results I Am Playr

As was explained in Section 5.2.2, the first-purchasing problem is also highly biased. When labelled by the first-purchasing labelling method, as introduced in Section 5.2.2, there were 489 players who at least purchased one item and 88,568 non-paying ones. This ratio is about 1:181.12, which is more serious than that encountered in the datasets considered in the disengagement-prediction problems. After the event-frequency-based

Table 7.4: Performance for predicting first-purchasing behaviours with event-frequency-based data representation on the dataset ofI Am Playr (with and without balancing)

Event Frequency without Balancing Event Feature Balanced by Undersam- pling

T-Value P-Value Effect-Size (Cohen’s D) AUPRC (first

purchase)

LogisticRegression 0.71±0.0189 0.39±0.0161 1.2335e+01 3.2349e-10 5.5164 DecisionTree 0.76±0.0107 0.49±0.0075 1.9210e+01 1.9296e-13 8.5909 SVM 0.24±0.0636 0.07±0.0068 2.5111e+00 2.1800e-02 1.1230 AUPRC (non

first purchase)

LogisticRegression 1.00±0.0002 1.00±0.0002 -1.7919e+00 8.9978e-02 -0.8014 DecisionTree 1.00±0.0001 1.00±0.0000 -5.6973e+00 2.1072e-05 -2.5479 SVM 1.00±0.0001 1.00±0.0000 -1.6818e+00 1.0987e-01 -0.7521 AUROC

LogisticRegression 0.95±0.0101 0.98±0.0049 -2.1638e+00 4.4174e-02 -0.9677 DecisionTree 0.90±0.0101 0.97±0.0043 -5.9087e+00 1.3594e-05 -2.6425 SVM 0.95±0.0036 0.96±0.0021 -1.5067e+00 1.4924e-01 -0.6738 KAPPA

LogisticRegression 0.67±0.0119 0.16±0.0037 3.9600e+01 5.8186e-19 17.7097 DecisionTree 0.74±0.0104 0.12±0.0051 5.0791e+01 6.8506e-21 22.7145 SVM 0.02±0.0159 0.07±0.0024 -3.3961e+00 3.2191e-03 -1.5188

data representation was applied, 2460 events can be included for representing the players’ behaviours.

Results of the experiment can be found in Table 7.4, where the notations shown are the same as for the experiments in the previous chapters. The results are surprisingly quite different from what was observed in the disengagement-prediction cases. As can be seen from the table, random undersampling was not able to provide significant dif- ferences in most cases. On the contrary, in four of 12 cases, the classifiers trained on the raw dataset behave significantly better, whereas random undersampling improved the performance in only three of 12 cases. And in the cases in which undersampling helped, one of them is less informative because it happened when the area under the PR curve was used, and the non-first purchasers were considered as positive exam- ples, due to the bias, the random guess can even reach a similar score in this case. Although the experiments in Section 5.2.2 show that classifiers trained with the under- sampling method are able to achieve significantly better performance than a random classifier, the observation in this experiment issues a warning about using a random- undersampling method. This method involves removing data points from the dataset for balancing, and in a very highly biased dataset like this, too much data needs to be removed. When some important informant is removed, the classifiers might also be negatively impacted. However, notice that although random undersampling did not help as much as it did in the disengagement-prediction tasks, if the target is to get a classifier which is good at predicting both classes (while measured by the area under the ROC curve), the random-undersampling method can still help decision tree achieve significantly better results with the other two cases, as did the classifiers trained on the raw dataset. This observation still matches what has been seen for predicting disengagement.

Lyroke

To see if the problem found in As can seen in I Am Playr , the same experiments were also run in Lyroke , too. As introduced in Section 5.2.2, when labelled by the first purchasing labelling method, the raw dataset is more biased (with a ratio of 1:594.76) than that of I Am Playr . In this game, there are 509 first paying users and 279,829 players who have not purchase any game item. When event-frequency-based

CHAPTER 7. BIASED PLAYER BEHAVIOUR MODELLING 113

Table 7.5: Performance for predicting first-purchasing behaviours with event frequency-based data representation on the dataset ofLyroke (with and without balancing)

Event Frequency without Balancing Event Feature Balanced by Undersam- pling

T-Value P-Value Effect-Size (Cohen’s D) AUPRC (first

purchase)

LogisticRegression 0.49±0.0254 0.21±0.0145 9.1937e+00 3.2039e-08 4.1116 DecisionTree 0.95±0.0080 0.41±0.0093 4.1697e+01 2.3208e-19 18.6474

SVM 0.06±0.0125 0.04±0.0025 1.2495e+00 2.2750e-01 0.5588 AUPRC (non

first purchase)

LogisticRegression 1.00±0.0000 1.00±0.0001 1.0716e+00 2.9807e-01 0.4792 DecisionTree 1.00±0.0000 1.00±0.0000 -2.3137e+00 3.2710e-02 -1.0347 SVM 1.00±0.0000 1.00±0.0000 -1.8741e+00 7.7248e-02 -0.8381 AUROC

LogisticRegression 0.98±0.0030 0.97±0.0048 1.4202e+00 1.7264e-01 0.6351 DecisionTree 0.96±0.0063 0.97±0.0013 -2.3751e+00 2.8864e-02 -1.0622 SVM 0.97±0.0058 0.97±0.0007 -1.3897e+00 1.8158e-01 -0.6215 KAPPA

LogisticRegression 0.48±0.0261 0.06±0.0013 1.5043e+01 1.2294e-11 6.7272 DecisionTree 0.94±0.0075 0.05±0.0036 1.0292e+02 2.1609e-26 46.0260

SVM 0.00±0.0000 0.03±0.0013 -2.1088e+01 3.8518e-14 -9.4309

data representation was used, a total of 7,832 events were used as features for training classifiers.

Table 7.5 shows how event-frequency-based data representation performs in predicting the first-purchase behaviours in the raw situation. Like what has been observed in I Am Playr , random undersampling is not able to make significant improvements to the performance of classifiers. As was the case with I Am Playr , there are four of 12 cases in which the classifiers trained on the raw dataset behaved significantly better whereas there is only one case in which random-undersampling significantly helped. This verifies that the random-undersampling method is sometimes risky if too much information needs to be removed. However, like what has been found forI Am Playr , if the target was to train classifiers that can predict both classes (measured by the area under ROC), the, though random-undersampling did not help improve the performance of the classifiers significantly, it did not have any negative effects either.

Race Team Manager

Finally, the same experiments have also been done inRace Team Manager where there are 511 first purchasers and 170,333 players who did not make any purchases. As in the previous two games, the bias ratio in this game is about 1:333.33, which is also highly biased. When event-frequency-based data representation is applied, 1,531 events are included.

Table 7.6 displays the performance classifiers reached when trained under both the raw dataset and the dataset balanced by random undersampling. Unlike the other experi- ments, in this experiment, random undersampling helped to improve the performance significantly in four of 12 cases, whereas the classifiers trained on the raw dataset were able to achieve better performances in two of 12 cases. By comparing the results in this game to those of the other two, we can see that whether random-undersampling helps is dependent on the dataset. However, as can be seen, when measured by the area under the ROC curve, in two of the three cases the random-undersampling method was able to bring significant benefits to the performance of the classifiers. This matches the observations made for the other two games.

Table 7.6: Performance for predicting first-purchasing behaviours with event-frequency-based data representation on the dataset ofRace Team Manager (with and without balancing)

Event Frequency without Balancing Event Feature Balanced by Undersam- pling

T-Value P-Value Effect-Size (Cohen’s D) AUPRC (first

purchase)

LogisticRegression 0.23±0.0139 0.18±0.0129 2.4944e+00 2.2570e-02 1.1155 DecisionTree 0.29±0.0217 0.32±0.0120 -1.2500e+00 2.2732e-01 -0.5590 SVM 0.08±0.0111 0.09±0.0090 -4.4975e-01 6.5826e-01 -0.2011 AUPRC (non

first purchase)

LogisticRegression 1.00±0.0003 1.00±0.0004 -2.0845e+00 5.1630e-02 -0.9322 DecisionTree 0.99±0.0002 1.00±0.0000 -3.0347e+01 6.5414e-17 -13.5718

SVM 1.00±0.0003 1.00±0.0001 -2.8128e+00 1.1515e-02 -1.2579 AUROC

LogisticRegression 0.86±0.0147 0.92±0.0091 -3.3067e+00 3.9231e-03 -1.4788 DecisionTree 0.62±0.0115 0.95±0.0029 -2.6364e+01 7.8077e-16 -11.7904

SVM 0.94±0.0096 0.96±0.0033 -2.1863e+00 4.2242e-02 -0.9778 KAPPA

LogisticRegression 0.25±0.0194 0.07±0.0010 8.5592e+00 9.2409e-08 3.8278 DecisionTree 0.31±0.0274 0.05±0.0014 8.8089e+00 6.0548e-08 3.9395 SVM 0.02±0.0139 0.06±0.0008 -2.7002e+00 1.4645e-02 -1.2076

7.3.3 Summary

As has been summarised in Figure 7.4, different from the expectations of the research, the experiments described in this section issue a warning concerning the use of the random- undersampling method for balancing classification problems. When the bias is serious, re- moving too much information is risky in most cases. In addition, whether the random- undersampling method can still help in this case is dependent on individual datasets. How- ever, similar to what was discovered while predicting disengagement behaviours, a common observation made in all three games is that, if the target is to get a classifier that can predict both classes well (measurement by the area under ROC), random undersampling is able to help in most cases and at least not bring significantly negative impacts. Therefore, depending on the predictive target and the datasets, one may consider whether random undersampling is the right technology.

Until now, experiments have shown that random-undersampling is able to help in most cases for predicting disengagement and some cases for predicting first-purchase with risks. However, even for the first-purchase prediction, we have shown in Section 5.2.2 that the performance of classifiers is significantly better than the random classifiers. As discussed in Section 6.4, the real challenging task happens when the total number of samples is not large enough. When this happens, the performance of classifiers created with higher-dimensional data representations might be affected. The experiments of Section 6.4 shown so far are about performance of classifiers trained with dataset that has been balanced with the random undersampling, in the next section, it is meaningful to investigate classifiers that were trained with its raw imbalanced dataset and work out if a random-undersampling method was able to help in the cases when the dataset is relatively small.