Crecimiento industria textil
DIAGNOSTICO DE LA EMPRESA
Secondly, we trained the decision tree on the code metrics of functions of the preceding release to predict functions in the following release as (possibly) “faulty” or “not- faulty”21. For example, we trained the decision tree on the code metrics of the functions of the release 1 of the “Flex” program to predict faulty functions in release 2 of the “Flex” program. Similarly, we used release 1 and 2 to predict the possible faulty functions in release 3. In general, we trained the decision tree on the code metrics of releases 1 to n-1 to identify the possible faulty functions (or suspected functions) in release n.
21
We trained a single decision tree in this case because there are only two categories of dependent variable: faulty and not-faulty.
The above approach of training the decision tree to predict faulty functions result in high accuracy with few errors (incorrect predictions) of “faulty” and “not-faulty” functions For example, usually 20% of the functions in a program are “faulty” and the rest (majority) of the functions are “not-faulty” and a decision tree classifier gets biased towards “not-faulty” functions. This biased decision tree classifier would predict suspected functions with approximatley 90% accuracy with around 10% of “faulty” functions predicted as “not-faulty” and about 1-2% of “not-faulty” functions predicted as “faulty”.
In this case, however, the cost of predicting a “faulty” function as “not-faulty” (false negative) is much more than the cost of predicting a “not-faulty” function as “faulty” (false positive). This is because if a function is not identified as “faulty” for the release of a program then we can not generate mutants for that function. If there is no mutant, then there will be no failed traces for the faulty function and the faulty function cannot be predicted in the actual failed trace. On the other hand, a few extra false positives will result in the generation of mutants of few more “non-faulty” functions and will not adversely affect the accuracy of prediction of faulty functions in failed traces. For example, if we get around 70% accurate predictions of (suspected) faulty and not-faulty functions from the decision tree with hardly any “false negatives” and about 30% “false positives” then we can generate mutant traces of all the (to be) faulty functions and use those traces to identify faulty functions in new traces; however, if we don’t have mutant traces of the suspected functions we cannot identify those functions in new failed traces. In short, we use the cost-sensitive learning strategy (Ting, 2002; Witten and Frank, 2005) to train the decision tree on the code metrics. Costs in cost sensitive learning are the values which force the decision tree to make lesser error on one type of predictions (e.g., faulty) than the other (i.e., not-faulty). Suppose ‘Cf’ is the cost of misclassifying a
function as “faulty” and ‘Cnf’ is the cost of misclassifying a function as “not-faulty”.
Training instances belonging to the “faulty” category are assigned weights according to the cost ‘Cnf’ and the training instances belonging to the “not-faulty” category are
assigned weights according to the cost ‘Cf’ 22
. The decision tree is then trained with the normal procedure on the training set except that new weights of instances are used instead of normal unit weights of instances (Ting, 2002).
For example, if we set ‘Cf’ to 1 and ‘Cnf’ to 20 then it means that the cost of
misclassifying a function as “not-faulty” is 20 times more than misclassifying it as “faulty”. Hence, the weights of instances in the training set belonging to a faulty class will be 20 times more than the instances of “not-faulty” class.
Thus, in this step, we first generate a training set with the high misclassification cost of functions as “non-faulty” (‘Cnf’) and the low misclassification cost of the functions as
“faulty” (‘Cf’). The selection of the cost ratios depend on the subjective judgment of the
user of a particular problem (Witten and Frank, 2005). We developed our own criteria for selecting the cost values: (a) we selected those cost values on which approximately 70% of faulty functions were correctly predicted as faulty in the training-set (prior releases); and (b) we used the training set of these identified cost values to predict expected faulty functions in the test-set (current release) using the decision tree. We selected the threshold value of 70% for training-sets because at this level we found that the majority of faulty functions in test-sets were correctly identified with fewer false negatives (incorrect not-faulty predictions) and not many false positives.
For example, if 20 functions are faulty in a training set and 80 functions are not-faulty, then we select those cost values at which 12-15 faulty functions are correctly predicted as faulty in a training set. This will also result in 10-30 not-faulty functions predicted as faulty (false positives). Overall there will be few more suspected (to be faulty) functions (i.e., including true positive and false positives) but fewer faulty functions incorrectly predicted as not-faulty. Note that, correct prediction of 12-15 faulty functions is approximately 70% but not exactly 70%. We choose cost ratios such that this value remains around 70% because sometimes selection of two adjacent cost values can make
22
Actual equation to measure the weights using cost can be found in Kai Ming Ting’s paper (Ting, 2002). We used the “cost sensitive learning” algorithm in the Weka API (Witten and Frank, 2005) which
all the functions or the majority of the functions predicted as faulty (e.g., [Cf =1, Cnf =30]
and [Cf =1, Cnf =40] can have such an effect and selecting any other cost value in between
them could result in the same predictions as [Cf =1, Cnf =30]). Thus, we select the cost
values by setting the threshold of around 70% for correct suspected (to be faulty) functions predictions for a training set, such that only a small proportion of false positives are predicted (i.e., not-faulty predicted as faulty). This criterion will help maintainers in identifying the suitable cost ratios for their software systems. An example execution of F007-plus is shown in Section 3.5.5.