Capítulo 2. Marco teórico
2.14 Antecedentes del desarrollo de habilidades informativas en el ámbito
We conclude from the above comparison between Case Studies 1 and 2 that al- though the MCD identification methods are different (see Section 7.1), both case studies support that MCDs are highly concentrated in a few (about 20% of) com- ponents and (about 10% of) fix relationships in the systems (see Section 7.2). 4Here, we cannot conduct a statistical significance test for this “2.8 vs. 2.5” comparison
because the sizes of the two comparing arrays are obviously not equivalent. However, from the practical perspective, we can declare that there is no substantial difference between 2.8 and 2.5 as the measures of MCD complexity.
Especially, the complexity measures of the MCDs identified with the different methods in the two case studies are similar (see Section 7.3).
The comparison between Case Studies 1 and 2 indicates that the subset of MCDs identified based purely on defect dataset can represent the whole set of MCDs in the system (identified based on defect-fix dataset), in terms of at least their distribution and complexity characteristics. This indicates an alternative way to investigate MCDs when only defect records are under investigation, and that the two case studies are complementary with each other.
Except the differences in MCD identification, distribution, and complexity measurement (see Sections 7.1–7.3), we also note two essential differences between these two studies. First, the research goals are different. Case Study 1 aims at examining MCDs (their complexity and persistence) in order to understand archi- tectural degeneration, but Case Study 2 aims at application of the DAD approach (see Chapter 5) and characterization of architectural degeneration. Therefore, their research questions are different (see Sections 4.1 and 6.1 for details). Sec- ond, the formats of the datasets under investigation are different. Case Study 1 investigates defect records but Case Study 2 investigates both defect records and change logs; see Tables 4.1 and 6.1 for the key attributes of the datasets in the two case studies. However, we have faced similar challenges during, and also have learnt similar lessons from, conducting these two case studies; see Chapter 9 for these challenges and lessons.
Chapter 8
Critical Assessment
Recall that Chapters 4, 5 and 6 describe the three main parts of this thesis re- search: Case Study 1 (analysis of multiple-component defects – MCDs), the DAD approach and its prototype tool, and Case Study 2 (DAD application). Here, we assess these three parts, focusing mainly on their limitations. First of all, we discuss the relationship between MCDs and architectural degeneration (see Figure 3.1), which is fundamental to this thesis research.
8.1
MCDs and Architectural Degeneration
We note from Section 3.1.2 that MCDs, due to their “multiple-component” nature, are related to potential crosscutting concerns (Eaddy et al., 2008) or architectural problems (von Mayrhauser et al., 2000) in the system. As Stringfellow et al. (2006) state, “problems related to interactions between components is a sign of problems with the software architecture of the system and are often costly to fix.” We thus propose that architectural degeneration manifests itself through MCDs (see Figure 3.1). Consequently, the characteristics of MCDs such as their quantity and complexity can reflect the impact of architectural degeneration on software defects. Therefore, we can evaluate the architectural degeneration for a specific system by examining the quantity and complexity of MCDs in that
system. This is the fundamental basis of this thesis research – characterization and diagnosis of architectural degeneration. However, there are two issues related to this fundamental basis.
First, we acknowledge that architectural degeneration has impact on not only software defects but also other quality aspects such as maintainability, adapt- ability, reusability, etc. (see an example quality model in McCall et al.’s study (1977)). Therefore, the defect perspective, as defined in the work, can only char- acterize one aspect of architectural degeneration; and we should not make cursory decisions on the architectural degeneration of a system based only on the MCD quantity and complexity measures of components and fix relationships.
Second, even from the defect perspective, architectural degeneration could re- late to defects confined to only one component (so called single-component defects or SCDs). For example, there are SCDs that require only “examinations”, but not “physical” changes, to more than one component for their corrections. These defects are obviously not identified as MCDs in the work but fixing these defects needs to consider their potential impact over the architecture.
Basili and Perricone (1984) call these defects (defects requiring examinations in more than one system module) “interface” defects. The literature (e.g., (Basili and Perricone, 1984), (Perry and Evangelist, 1987) and (Nakajo and Kume, 1991)) indicates that interface defects account for about 40%-65% of all defects (see Sec- tion 2.4.3). We note that MCDs are included in interface defects (as per Basili- Perricone’s definition). However, there could be over 40% of interface defects that are not MCDs, which are thus not considered in the proposal on the relationship between architectural degeneration and MCDs. Therefore, this could affect diag- nosis of architectural degeneration from the defect perspective. Unfortunately, the use of Basili-Perricone’s interface defect definition requires subtle, “examination” (or soft), measures (as described in Section 2.4.3), which are not captured widely in actual software projects, including the systems under the investigation.
8.2
Case Study 1: MCD Analysis
Built upon the relationship between MCDs and architectural degeneration (see Figure 3.1), Case Study 1 (see Chapter 4) investigates the distribution, complexity and persistence of MCDs in a large legacy system (of size 20 million SLOC), for the purpose of quantifying the extent to which architectural degeneration affects software defects. The defect dataset analyzed for this study covers 17 of over 20 years of the system and six of the nine major releases.
Results indicate that MCD are concentrated in a few components in the system (see Figure 4.1). Results also indicate that MCDs are complex to fix and are persistent across development phases and releases (see Tables 4.3–4.5). Knowing these characteristics can help management and maintenance staff to focus on particular hard-to-fix defects. Moreover, the MCD profile reflects the adverse impact of architectural degeneration on software defects, mainly, in terms of their fix complexity and difficulty. Knowing this can aid understanding the architectural degeneration of the system and can also increase the necessity and significance of treating the architectural degeneration.
We note from Section 2.4.5 that there is clearly little research conducted on characteristics of MCDs and from Section 4.7 that there are no studies in the literature similar to this case study. In particular, the findings on MCDs add to the current knowledge on architectural defects (e.g., the genre defined by Endres (1975) and Basili-Perricone (1984)) and degeneration.
While we have answered the three questions, (i)–(iii), posed in Section 4.1, as yet we do not know whether the defect complexity metric (i.e., the number of accompanying changes required to fix a defect) reflects the real complexity of defects in the system. The defect dataset under the investigation cannot support this validation. Thus, careful thought needs to be considered in the design of such a study in other contexts.
components and the most frequently occurring fix relationships (see Figure 4.1) was not addressed. Such inter-relationship could benefit cost-effective system quality improvement. Another limitation of this case study is the lack of findings about characteristics of the MCD-prone components. For example, we do not know from this study the extent to which these components tend to persist across phases and releases. Such characteristics can help improve the system quality.
8.3
DAD Approach and Tool
The DAD approach (see Chapter 5) aims at: (i) identifying degeneration-critical components and fix relationships, (ii) evaluating persistence of components and fix relationships, and (iii) evaluating architectural degeneration for a given system, using the MCD quantity and complexity metrics (as defined in Section 5.2.2). A conceptual DAD framework was proposed in order to carry out these three goals (see Figure 5.1). A prototype tool was developed to facilitate DAD application in real system contexts.
Note that the role of DAD and its prototype tool is to operationalize the de- fect perspective of architecture degeneration. So, in itself, it does not contribute directly to new theories, but it helps automate the process of discovery. The information of architectural degeneration derived with DAD can help treat the architectural degeneration problem in the system, which could lead to increase in system quality and decrease in maintenance costs. DAD can thus complement ex- isting techniques for architectural degeneration diagnosis (see Section 2.3.3), such as architectural deviation detection (Murphy et al., 2001) (Lindvall and Muthig, 2008), defect-prone component (DPC) identification (Ohlsson and Wohlin, 1998) (Li et al., 2009), and fault architecture construction (von Mayrhauser et al., 2000). In particular, we note that von Mayrhauser et al. (2000) propose an approach to derive fault architectures from system defect history, which can highlight the degeneration-critical fix relationships in the architecture from the MCD quantity
perspective. Fault architectures are similar to defect architectures created with the DAD approach. However, DAD defines both MCD quantity and complexity metrics (see Section 5.2.2) which support creating defect architectures from both MCD quantity and complexity perspectives. Therefore, we can say that fault architecture is a type of defect architecture and there are defect architectures that are not fault architectures.
However, DAD defines only a defect perspective for diagnosing architectural degeneration. It cannot support the diagnosis from other perspectives such as architectural deviation (see Section 2.3.1). Especially, it is obvious that even from the defect perspective, the MCD quantity and complexity metrics (see Sec- tion 5.2.2) cannot measure all characteristics of architectural degeneration. For example, these metrics do not involve the severity information of defects due to architectural degeneration. It could be that architectural degeneration leads to more severe defects but DAD cannot measure it.
In addition, DAD examines only the architecture-level degeneration of the system. It cannot offer information about, for example, which code files contribute most to the “degeneration” of a component, and which fix relationships among code files frequently occurred across phases and releases. This kind of information can help refine strategies and solutions of treating architectural degeneration.