El turismo en Chile - CHILE PROGRAMA DE FOMENTO AL TURISMO

We decided to apply a goal oriented approach to the detection of anti-patterns, following the GQM approach [BW84]. Following the work of Moha, we determined how three types of anti-patterns could be identified. These were: Blobs, large classes that do too much; Spaghetti Code, large, badly structured classes; and Functional Decomposi-

tion: classes that are only built to provide functions. For every anti-pattern, our goal is

its detection. The questions correspond to the different symptoms that are described in the anti-pattern reference by Brown et al. [BMB+_{98]. Finally, we selected metrics to}

evaluate every question. The choice of metrics was guided by two sources of informa- tion. First, if Brown et al. described a specific metric, then it was chosen. If not, we

55 Goal: Identify Blobs

Definition: Class that knows or does too much

Question Metric

B1 Is the class a large class? Number of methods and attributes declared B2 Is the class a controller class Presence of names indicative of control:

Process, Manage... (terms used in [MGDM10] B3 Is the class not cohesive? LCOM5 is high

B4 Does the class use data classes Number of classes used that contain 90% accessors

Table 4.I: GQM applied to Blobs Goal: Identify Spaghetti Code

Definition: Code that does not use appropriate structural mechanisms

Question Metric

S1 Does the class have long methods? maximum LOCs in methods is high S2 Does the class have methods with # methods with no parameters is high

no parameters?

S3 Does the class use global variables? # global variable access is high

S4 Does the class not use polymorphism? % of non-polymorphic method calls is high S5 Are the names of the class presence of names Process, Init, Exec, Handle,

indicative of procedural programming? Calculate, Make... (terms used in [MGDM10])

Table 4.II: GQM applied to Spaghetti Code

reused metrics used by Mohaet al. in their rules. If they ignored the symptom, then we defined our own metric. The result of our methodology is presented in Tables 4.I, 4.II and 4.III.

4.2.3.1 Operationalising the Model

In Figure 4.3, we present the detection model corresponding to the Blob. There are three levels to the detection process. The bottom corresponds to the metrics extracted; the middle, to the questions, and the top to the goal. Between every level, there is an operation that is defined. To pass from a metric to a symptom, we need to transform a metric to a probability distribution. To pass from a symptom to a quality assessment, we apply Bayesian inference (i.e., P(Blob|Symptoms)). From the perspective of a quality engineer or a developer, he would likely not want to know if a class is a Blob as does DECOR, but rather what are those that present the highest risk of being a Blob..

Goal: Identify Functional Decompositions

Definition: Object-oriented code that is structured as function calls

Question Metric

F1 Does the class use functional names? same as Q5, in Table 4.II

F2 Does the class use object-oriented mechanisms # overridden methods> 0 or DIT > 1 (no inheritance, no polymorphism)

F3 Does the class use classes with % of invocations to classes/methods functional names? with functional names (like Q1) F4 Does the class declare a single method? # methods declared = 1 F5 Are all the class attributes private? % of private attributes = 100%

Table 4.III: GQM applied to Functional Decomposition

Figure 4.3: GQM applied to the Blob

Converting Metrics to Distributions To compute the probability distributions of the symptoms, we first discretised metrics values into three different levels: “low”, “medium”, and “high”. We used a box-plot to perform the discretisation. A box-plot, also known as a box-and-whisker plot, is used to single out the statistical particularities of a distribution and allows for a simple identification of abnormally high or low values. It identifies any value outside [Q1 − 1.5 × IQ, Q3 + 1.5 × IQ], where Q1 and Q3 are respectively the first and third quartile, and IQ is the inter-quartile range (Q3 - Q1). Figure 4.4 illustrates the box-plot and the thresholds that it defines: LQ and UQ correspond respectively to the lower and upper quartiles that define thresholds for outliers (LOut and UOut).

57 the symptom is computed as follows:

• For symptoms captured by metric values, the probabilities are calculated as follows: we use three groups (“low”, “medium”, “high”) and estimate the probability that a quality analyst would consider the metric values as belonging to each group. Limiting the number of groups to three simplifies the interpretation of the detection results. For each metric value, the probability is derived by calculating the relative distance between the value and its surrounding thresholds like with a fuzzification process. The probability is interpolated linearly as presented in Figure 4.4. In this Figure, we present the probability density function corresponding to our statistical analysis of metric values.

Figure 4.4: Probability interpolation for metrics

• For symptoms describing class names, probabilities are either 0 or 1, whether the name contains a term or not. For method names, we treated the number of methods containing the term as a metric and used the box-plot to interpolate a probability. According to this rule, the probability that a class named MakeFoo has a functional name is 1 (P(FunctionalName = True) = 1);

• For symptoms that determine the strength of relationships, the probabilities are calculated using the numbers of such relations, (e.g. the number of data classes with which a class is associated). The more a class is associated to data classes, the more likely it is a Blob. In our example, we consider that classes with over 90% of their methods that are accessors (return or set the values of an attribute), are data classes. To convert this count to a probability distribution, its value is interpolated between 0 and N where N is the upper outlier value observed in the

program. If the upper outlier value of # of data classes in a system were 11, then a class associated to 6 data classes would have a probability of 50% to be considered a controller (P(ControllingBehaviour = T ) = 50%).

Evaluating Anti-pattern Probabilities The probability of a class being an anti-pattern is inferred from the probabilities of symptoms using Bayes’ theorem. Every output node has a conditional probability table to describe the decision given a set of inputs. In our example, the probability of a class being a Blob depends on four symptoms. We can use previously tagged data to fill a conditional probability table describing all possi- ble combination of symptoms, i.e. P(Blob|Size = high,Cohesion = low,ContrBehav =

high,ContrLing = T ). When executing the model, the actual probability distribution of all symptoms will be used to evaluate the probability of a class being a Blob (4.1) as we assume the independence of all symptoms.

P(Blob) =

_∑

Symptoms∈all combinations

P(Blob|Symptoms) × P(Symptoms) (4.1)

In document CHILE PROGRAMA DE FOMENTO AL TURISMO (página 6-10)