Tabla 7.16: Estimación de la evolución del PIB provincia de Burgos

Table 4.3: Selected Predicates for gzip-1.2.3

Ranks Filename Line Num. Predicate

P1078 bits.c 165 (bi valid > 8) == true

P1190 gzip.c 590 (force) == false

P1210 gzip.c 667 (verbose) == true

P1136 deflate.c 615 (match length <= max insert length) == true

P1137 deflate.c 625 (match length != 0) == true

Now we report on the second case study with gzip-1.2.3. The gzip program and the accompa- nying test suite of 217 test cases are also obtained from the “Subject Infrastructure Repository”. The gzip program has 6,184 lines of C and assembly code, and is instrumented with 808 boolean and 1071 return predicates. Some predicates that will be referred to later are presented in Table 4.3.

Two “subclause-missing” faults are injected into the gzip program, as shown in Figure 4.8. The two faults each fail 65 and 17 of the entire 217 test cases. Because gzip is an independent case study from grep, the two faults are still denoted by Fault 1 and Fault 2, respectively. Similarly, we use F1 to refer to the 65 failing cases due to Fault 1, and F2 for the 17 failing cases due to Fault

2. When both faults are activated, 82 test cases fail, which are exactly the union of F1 and F2.

Figure 4.8: Two Injected Faults in gzip-1.2.3 the failing traces in F1 and F2, respectively.

−0.4 −0.2 0 0.2 0.4 0.6 −0.6 −0.4 −0.2 0 0.2 −8 −6 −4 −2 0 2 −4 −3 −2 −1 0 1 2

Figure 4.9: Proximity Graphs with R-Proximity (left) and T-Proximity (right) for gzip-1.2.3

The proximity between the 82 failing traces have been plotted under R-Proximity (left) and T-Proximity (right) in Figure 4.9. Red crosses represent the 65 failing cases in F1, and blue circles

represent the 17 cases in F2. Without ambiguity, one would identify the two clusters as shown in the

left subfigure. This clustering is nearly perfect because both clusters are pure, and are meanwhile separate from each other. In comparison, under T-Proximity, there are two distinct subclusters of red crosses, and a blue circle is far from other circles. This again shows that failing traces due to the same fault can actually exhibit quite divergent behaviors.

0 200 400 600 800 1000 1200 1400 1600 1800 0 5 10 15 20 25 Predicate: 1190 Frequency: 25 Predicate Index Frequency (a) top-1 0 200 400 600 800 1000 1200 1400 1600 1800 0 10 20 30 40 Predicate: 1078 Frequency: 39 Predicate Index Frequency (b) top-2 0 200 400 600 800 1000 1200 1400 1600 1800 0 10 20 30 40 50 60 Predicate: 1078 Frequency: 51 Predicate Index Frequency (c) top-3

Figure 4.10: Case Study with gzip-1.2.3: Top-k Spectrum Graphs for Cluster1

0 200 400 600 800 1000 1200 1400 1600 1800 0 5 10 15 20 Predicate: 1136 Frequency: 17 Predicate Index Frequency (a) top-1 0 200 400 600 800 1000 1200 1400 1600 1800 0 5 10 15 20 Predicate: 1136 Frequency: 17 Predicate Index Frequency (b) top-2 0 200 400 600 800 1000 1200 1400 1600 1800 0 5 10 15 20 Predicate: 1136 Frequency: 17 Predicate Index Frequency (c) top-3

Figure 4.11: Case Study with gzip-1.2.3: Top-k Spectrum Graphs for Cluster2

After the two clusters are identified, appropriate developers can be found for each cluster by inspecting the spectrum graphs. Figures 4.10 and 4.11 present the spectrum graphs for Cluster1 and Cluster2, respectively, with k varies from 1 to 3. Similar to the what was observed in the grep case study, only a limited number of predicates are favored in each cluster, and the set of favored predicates is insensitive to the setting of k.

Specifically, Figure 4.10 suggests that three predicates P1078, P1190 and P1210 are most favored

by the member traces in Cluster1. Because predicate P1078 points to the function bi windup and

the other two predicates point to the function treat stdin, the 65 failing cases in Cluster1 are assigned to developers in charge of the functions treat stdin and bi windup. Because the faulty function deflate connects the function call chain from treat stdin to bi windup, the assigned developers are appropriate.

The assignment of failing cases in Cluster2 is similarly straightforward. Figure 4.11 shows that all the 17 member traces rank predicate P1136 at the top, and the predicate P1137 is put as the

second highest in 16 member rankings. The two predicates are too close to be distinguished in Figure 4.11(b). Because predicates P1136 and P1137 are the most favored predicates, failing traces

in Cluster2 are assigned to developers in charge of the deflate fast function, the exact faulty function.

Finally, let us examine whether the proximity graph under R-Proximity can help developers find the two faults. Different from what was observed in the case study of grep, the left subfigure of Figure 4.9 shows that the debugging result τ is far from all individual rankings. This means that no single failing case can account for the the debugging result, and consequently, the debugging result τ is regard as less accurate. In fact, the top-3 predicates of τ all point to the function send tree. Because we are unfamiliar with the code of gzip, and no single failing case can explain τ , we failed to explain how the three predicates relate to the faults.

The reason why τ is far from all individual rankings is probably that Fault 1 and Fault 2 are semantically similar. Specifically, Fault 1 and Fault 2 are in functions deflate and deflate fast, respectively, and the two functions implement a similar task except the efficiency for certain situations. As a result, the abnormal behaviors due to Fault 1 and Fault 2 are intertwined together, and Sober cannot separate the abnormal behaviors due to each fault. Finally, Sober generates the debugging result τ , which seems irrelevant to either fault. Such intertwined situations were not observed in the case study of grep, nor in previous studies [64, 59], because the faults there were semantically distant, and the abnormal behavior due to a particular fault outweighs that due to the other faults. This unsuccessful experience with Sober, together with the above discussion, underlines the importance of properly clustering failing traces before fault analysis.

When the debugging result is found less accurate, i.e., being far from all individual rankings, there are still other options for developers to explore. For example, the developers can re-run Sober on each identified failure clusters, and investigate the fault in each cluster. We applied Sober to Cluster1 and Cluster2 separately, and accurate debugging results were obtained for both clusters. As another alternative, if member traces in a failure cluster have a high agreement about the fault location, i.e., densely clustered under R-Proximity, one can choose a proper failing case to debug based on the individual rankings. For example, since the fault location is highly agreed within Cluster2, a developer can easily find a failing case that ranks P1136 and P1137 at the top,

and start debugging. The principle is that the proximity graph under R-Proximity visualizes the relationship between the debugging result and each individual failing traces, and one can utilize it in different ways.

In document Boletín de Coyuntura Económica (página 96-100)