CAPÍTULO 3. Persiguiendo objetivos
3.3. Garantía de protección de derechos y de acceso a Servicios Públicos
This section focuses on aspects of validity and reliability which were identified in relation to the study and describes how these were addressed to ensure research rigour and
trustworthiness. As has been stated in this chapter, the study adopted a mixed-methods research design that included both quantitative and qualitative data.
4.7.1 Validity.
Validity refers to measuring what one claims to be measuring (Creswell, 2003). According to Lincoln and Guba (1985), validity is difficult to assess and has many dimensions: internal validity (credibility), external validity (transferability) and construct validity.
Internal validity is associated with the degree to which a study minimizes systemic errors or research bias, that is, the degree to which a researcher is able to say that no other variables except the ones under study have led to the results. According to Davis (1992), there are multiple ways to achieve internal validity. First, valid studies must provide evidence of lengthy engagement in a given field. In the present study, the data was collected over a period of three months. Another source of validity is the level of the richness and accuracy of the data. To strengthen validity, I provided a detailed and realistic description of how the study was conducted, including the process of data collection and how data was managed and analysed. I also stated that accuracy was accomplished via methodological triangulation in which multiple methods, i.e., questionnaire, interviews, retrospective protocols and document analysis, were brought together during the analysis and interpretation phases to ensure in- depth understanding of the phenomenon under study.
External validity involves the extent to which the research findings are replicable. According to Davis (1992), external validity is established when ‘the findings can be generalized to other contexts and/or subjects’ (p. 606). To achieve external validity, I provided a rich description of the context and the participants so that the reader can determine the degree to which the results of a study can be transferred to their contexts (McKey & Gass, 2005).
Construct validity refers to how well a research test or tool measures the construct that it is designed to measure (Creswell, 2003). In this study, construct validity indicates the extent to which a questionnaire measures EFL teachers’ beliefs about EGA. To establish this type of validity, an extensive review of previous studies to establish and justify the need of the current study was carried out. In addition, questionnaire items were adapted from studies conducted in the field of LTC and EGA, and feedback was obtained from my supervisor. The
questionnaire was also piloted with a group of volunteers whose views were as similar as possible to the target population. According to Baker (1994), the pre-testing or trying out of a particular research instrument can identify the potential practical problems with following the particular research procedure or whether proposed methods or instruments are
inappropriate/excessively complicated.
4.7.2 Reliability.
Reliability refers to the degree to which a research method produces stable and consistent results (Davis, 1992). Research reliability can be divided into three categories: test-retest reliability, parallel forms reliability and inter-rater reliability (Dudovskiy, 2018). In this study, parallel forms reliability and inter-rater reliability were taken into consideration. Parallel forms reliability means that the results obtained from one assessment instrument (concerning a certain phenomenon with a group of participants) should be regenerated if a different instrument is used to measure the same phenomenon with the same participants. For example, if the results regarding EFL teachers’ beliefs about EGA are obtained through a questionnaire and interviews should yield similar results that would prove the consistency of responses and allow comparison if required (triangulation). In this study, parallel forms reliability is achieved using multiple research instruments.
Another type of reliability is inter-rater and intra-rater reliability. This type of reliability is crucial to obtain, especially with qualitative research instruments (e.g. interviews,
retrospections and document analysis). Inter-rater reliability asserts that the same results should be obtained by different assessors who use the same method. In this study, inter-rater reliability was established by asking a second coder to code my qualitative data using the same coding technique. The second coder was an assistant professor in Applied Linguistics
who coded one sample from each instrument (interview, retrospection and exam sample). I also provided her with the lists of codes for all datasets, their definitions and examples from the data. The final procedure required checking the similarities between the researcher and the second coder regarding the application of the codes. This was achieved by using Scholfield’s (2005) formula (Figure 24). The results of the inter-rater reliability tests are presented in Table 13.
Figure 24. Scholfield’s formula for inter-rater reliability agreement (2005).
Table 13.
Inter-rater Reliability of the Coding Data sources Number of items
coded the same by the two raters
Number of items coded by the researcher Agreement result Interview 12 16 75% retrospection 7 9 78% Document 9 9 100%
Table 13 indicates that the total percentage of agreement between the researcher and the second rater in two datasets (75%; 78%) were below 80%. Ideally a minimum of 80% agreement is recommended in the literature (Huberman, 1994; Mackey & Gass, 2005). The agreement obtained in this study for two (out of three) samples was just below 80% and was, therefore, considered acceptable; particularly given that, for one dataset, this percentage was
found to be 100%. However, it should be noted that it was not possible to discuss the
differences and disagreements with the second coder due to her workload and unavailability.
As a second measure of reliability, I also conducted intra-rater reliability which refers to the degree of agreement among multiple repetitions of assessment performed by a single rater. To accomplish this, I coded the data for the first time in January 2019 and then again in April 2019. My coding in both occasions was identical; this might be because my memory is rather strong, and I had been immersed in my data with the codes constantly present in my mind.