EVALUACION - PROGRAMA DE TRANSFORMACIÓN PRODUCTIVA TÉRMINOS DE REFERENCIA

This sections discusses the potential threats to the validity of the research results presented in this thesis. The section is grouped into internal validity and external validity.

5.2.1 Internal Validity

Exploratory Study. The researcher was the data collection instrument in the exploratory study (qualitative interviews). Therefore, one potential threat to internal validity of the explorartory study is the researcher’s bias and his potentially various behaviors in different interview sessions. The researcher used an interview guideline including a list of important questions to be asked during all interview sessions, to undermine the effect of this threat.

Case Study. One shortcoming of the case study is that just one of the authors has annotated the reference documents, more than one opinion would be helpful to ensure the internal validity. The same author has annotated the reference documents in a second path, to decrease a possibly negative effect of this shortcoming.

5.2. THREATS TO VALIDITY OF RESEARCH FINDINGS 53 testing effect [CSG63]. To avoid the issue, the participants were divided into two groups and the two documents were swapped between the groups. As a result, group 1 in the second stage examined the sentences from the document that group 2 had in the first stage and vice versa.

Survey. The demographic questions in the beginning of the survey are designed to iden- tify invalid answers so that these answers can be excluded from the analysis. Also a man- ual checking over the answers, with the focus on qualitative written answers is applied to find the possible spam answers. However, the chance that some irrelevant participants have given answers or that some participants have become careless in their response in the last scenarios of survey is not zero; which is a potential threat to internal validity.

5.2.2 External Validity

Exploratory Study. Generalization of results to all large-scale enterprise based on the enterprises from one industrial domain, is arguable. Although large-scale enterprises from other domains like telecommunication, finance, or health-care also have challenges on making architectural decisions for enterprise application development, our study shows that the software integration in the electricity industry is more immature than other areas due to lack of standardization and it can affect the architectural decision issues. So we are aware of the threat to external validity. Conducting the same interviews with enterprises from other domains would make the results more reliable.

Case Study. External validity is a goal that is more difficult to attain in a single case study [Tel97]. However, to increase the statistical generalizability of the results [Yin14], we chose three large and architecturally significant documents among six documents we received from the telecommunication company. The three documents belong to three different projects, and the authors of the first document are different from the authors of the other two documents. These selections ensured that the reference documents contained a sufficient amount of architectural issues, and have various language styles. Receiving more architecture-related documents from more companies would definitely increase the external validity of the case study. We requested documents from more companies (inside and outside Norway). Either they did not have available architecture-related documents in English, or it was not possible for them to share it with us due to confidentiality concerns. Experiments. In both experiments, the potential threat to the external validity is the selection of the material (number, size and domain) for the experiment. We were aware of this issue, and therefore in the second experiment we tried to select text from two documents of different types (one is domain literature from Smart Grids, and the other is a document from a telecommunication company). The portions of the documents are selected in a way that all positive, negative, false positive, and false negative sentences are present in the text, as explained in Paper 4. We would need to ask the experts to spend

much more time on the experiment, to evaluate the framework by applying it on larger documents from more diverse domains, which was not feasible.

Survey. Small number of respondents is often a potential threat to external validity of surveys. However, the required number of raters in inter-rater agreement studies has been suggested differently, to ensure an adequate precision in the results [Gwe10, LS07]. Even when coefficient of variation for percent agreement is anticipated to be 5%, 40 raters are enough to participate in the study, according to Gwet [Gwe10].

5.2.3 Construct Validity

Experiments. One potential threat to construct validity of the experiments is that participants might interpret an ”architectural issue” differently from researchers. In that case, what the participants annotated as architectural issues in the experiment material might not be what the researchers meant by architectural issue. To undermine this threat, an introduction stage was included in both experiments to define an ”architectural issue” with some concrete examples.

Survey. Similarly, a potential threat to construct validity of the survey is possibly in- consistence interpretations of ”architectural issue” and ”quality attribute” between researchers and participants. The quality attributes were defined in the introduction of the survey based on a well-known standard quality model. Architectural issue was also defined and explained in the introduction of the survey. Besides, the survey was first conducted on some experts and their feedbacks were applied in designing the final version of the survey questionnaire to undermine the threat to construct validity.

5.2.4 Conclusion Validity

Experiments. The results of the experiment with students was not analyzed by any statistical methods and therefore the conclusion we draw from the first experiment has a potential threat to validity. To diminish this threat in the second experiment, we applied statistical methods to test both the normality of data distribution and the reliability of the comparison between results of the three extraction methods.

Survey. The survey could have a threat to conclusion validity if no reliability test was conducted on the data. ICC and Krippendorff’s alpha tests take the reliability into account besides measuring the agreement, to increase the conclusion validity.

In document PROGRAMA DE TRANSFORMACIÓN PRODUCTIVA TÉRMINOS DE REFERENCIA (página 37-45)