Directiva Windows Terminal Server - III Gestión de estaciones de trabajo

This section discusses internal, external, and construct threats to the validity of the experimental results.

8.9.1 Threats to Internal Validity

Internal validity pertains to the certainty of the cause and effect relationship between the independent variables and the dependent variables in the experiment, when the control variables remain unchanged during the experiment. The control variables, dependent variables, and independent variables involved in the experiments are listed in Table 36.

Variable type Variable

1. Control Hardware configuration – see APPENDIX B 2. Control Software configuration – see APPENDIX B 3. Control Software under test (Subject)

4. Control Experimental profile (BNF specification) (Treatment) 5. Control Initial state of the database

6. Control Person conducting the experiment (Researcher) 7. Control Physical environment (location of PC)

8. Control Settings of DBMS configuration variables 9. Control Time and date of experiment

10. Control Versions of Perl programs, batch files and shell scripts 11. Dependent Execution time (sec)

12. Dependent Number of statements accepted 13. Dependent Number of statements failed 14. Dependent Number of statements rejected 15. Dependent Number of statements succeeded 16. Independent Sequence of SQL statements executed

17. Independent Seed value (initialise random number generation)

Table 36: Variables involved in the experiments

Threats to internal validity arise where the control variables have changed between experimental runs, or where a control variable in the experiment has not been taken into account.

The hardware and software configuration, physical environment, and researcher have not changed over the course of the present research. The reported experiments have been carried out over a period of six years between 2005 and 2011 and some changes to the experimental profiles (BNF specifications) and Perl programs, batch files and shell scripts have occurred over this time. However, these would not have changed during a single repetition of an experiment. Changes to experimental variables were recorded in the author’s research logbook, along with the experimental results, for future reference and program files contain version history in comments.

Note that the researcher conducting the experiment, while a control variable, is also a source of (possibly unconscious) bias. Such bias might include the choice of literature sources, the choice of experimental subjects, choice of methodology, choice of metrics, choice of analysis technique, and interpretation of results.

The best defence against these threats to internal validity is probably to replicate the experiments; however this was not possible within the resources and timescales of the present research.

A further internal threat to validity is the power of the experiments, that is, the level of statistical confidence that can be assigned to the results. In the present research, 95% confidence intervals for the mean were estimated for the experimental results and in some instances this showed that the experiment lacked sufficient power to provide confidence at this level. The power of the experiments might be improved by increasing the number of experimental runs, however as noted by Korver (1994) and discussed in Chapter 2, the sample size must be increased by a factor k2 to reduce the standard error by a factor k.

8.9.2 Threats to External Validity

External validity pertains to the generality of the results; ideally, the results of the present research should generalise to a wide variety of different software components, potentially containing a wide variety of different faults, possibly resulting in a wide

variety of different failure behaviours, in a wide variety of different testing and operating environments.

Threats to external validity arise where the choice of subjects, treatments, or variables involved in the experiments limit the generality of the results.

The focus of the present research is on software components described by a request- response model; however, this is a common model of communication between software components, as described by Martin-Flatin (2005) and should generalise to a fairly wide variety of different software components.

F-measure and delayed failure metrics are based on simple crash/hang failure, because a stateless test oracle was desired, as discussed in Chapter 1. This is a limitation, as correctness of responses is not considered; however a crash, for example due to incorrect exception handling, is a common type of failure, see Mao & Lu (2005).

The behaviour of DBMS may not be representative of software components in general, and the behaviour of MySQL and Oracle XE may not be representative of some other DBMS. However, MySQL and Oracle are both Relational DBMS, probably the most common type of DBMS, and as they are both at least partly written in the C/C++ programming language they are likely to share many of the vulnerabilities and errors common to C/C++ with other software components written in that language.

The SUT studied in the present research run on both the Linux and Microsoft Windows operating systems, which are common system software environments.

The experimental profiles used in the present research are unlikely to be similar to any real operational profiles; as already discussed in this chapter, the experimental profiles may be more likely to reveal software failures than real operational profiles. It may be possible to relate apparent reliability during stochastic testing to actual component reliability, however this is an area for future research. Testing beyond the limits of real operational profiles is an established practice in other engineering disciplines; for

example, Forsberg, Mooz & Cotterman (2005) define ‘Qualification’ as demonstration that “the design will perform in the intended environment, with margin”.

The best defence against threats to external validity is probably to replicate the experiments with a different choice of subjects and treatments; this would be an interesting topic for future research.

8.9.3 Threats to Construct Validity

Construct validity pertains to the use of surrogate measures for properties of interest in the research problem or question.

Threats to construct validity arise where the surrogate measure is not truly correlated with the property of interest. Properties of interest in the present research are software component reliability and robustness, failure delay, and fault detection effectiveness; the surrogate measure for these properties in the experiments is the F-measure as discussed by Chen, Kuo & Merkel (2004). Further research into the correlation of the F- measure with these properties is desirable; other measures of fault detection effectiveness have been proposed, including the E- and P-measures defined in Table 3 in Chapter 2, as discussed by Liu & Zhu (2008).

8.10 Summary

This chapter presented a discussion of the present research within the context of stochastic testing, SQL mutation, experimental studies of DBMS, fault detection effectiveness and efficiency, the conceptual framework, delayed failure, the research problem, and threats to validity.

9 CONCLUSION

In document III Gestión de estaciones de trabajo (página 54-57)