Evaluating and examining the instrument used for collected data is an important part of any research to ensure that the measures which had been used were reasonably appropriate. The most prominent criteria for the evaluation of business research are validity and reliability. These measurements are the basic criteria for assessing the accuracy of quantitative research. Reliability is essentially concerned with the issues of consistency of measures, while validity is concerned with whether or not a measure of a concept actually measures that concept (Bryman & Bell, 2007). Accordingly, the measurement should be consistent across time and across the items used, that is, if a measurement is repeated on the same object, we should obtain similar results (Sekaran, 2003).
3.8.1 Reliability
The reliability of a measure refers to the extent to which it is without bias and therefore ensuring a consistent measurement over time and across the several items in the instrument (Sekaran, 2003). Reliability provides an indication of the stability and the consistency of the instrument. Stability is concerned with whether or not a measure is stable over time, that is, if an instrument is given to the same individual at two different occasions, it is not certain whether it will yield similar results (Bryman & Bell, 2007).
125
Consistency, or internal reliability, indicates whether or not the indicators that make up the scale or index are consistent- in other words, whether or not respondents’ scores on any one indicator tend to be related to their scores on the other indicators (Bryman & Bell, 2007).Test-retest; internal consistency and parallel form reliability are different forms of measuring reliability. However, the most widely used form of reliability is internal consistency, assessed by Cronbach’s coefficient alpha (Easterby-Smith et al., 2002). In this study, Cronbach’s coefficient alpha was calculated to determine the overall reliability of the multiple items used in this study.
The only multiple item measures in the study were personal moral philosophy dimensions (idealism and relativism) and ethical climate types (law and code, company interest, social responsibility, and personal morality). For both to be classified as reliable, it is generally recommended that a Cronbach’s coefficient alpha of 0.7 or greater should be obtained (Pallant, 2001). However, Nunnally (1978) suggested that a coefficient alpha of between 0.5 and 0.6 is an acceptable level of reliability. From Table 3.2, it can be seen that moral philosophy for both samples ranged from .61 to .79. Although the levels of reliability of idealism and relativism of management accountants were higher than those of accounting students, both levels are judged adequate for this exploratory research (Peter, 1979). Prior business ethics research obtained similar level of reliability for both idealism and relativism (Al-Khatib et al., 1997; Ruhi Yaman & Gurel, 2006; Swaidan, Rawwas, & Vitell, 2008; Swaidan et al., 2004). With respect to ethical climate, all types had a Cronbach’s coefficient alpha ranging from .65 to .87, which is within the ranges obtained by the inventors, Victor and Cullen (1987), where the Cronbach’s coefficient alpha ranged from .6 to .8 . Also, it can be seen that the overall level of reliability of ethical climate is .87. Several previous business ethics studies obtained similar levels of reliability for the four types of ethical climate investigated in this study (Agarwal & Malloy, 1999; Shafer, 2007, 2009; Upchurch, 1998; VanSandt et al., 2006; Vardi, 2001; Venezia & Callano, 2008).
126
Table 3.2 Cronbach’s Coefficient Results
3.8.2 Validity
Validity is considered as one of the most crucial criteria of research (Bryman & Bell, 2007). It refers to the extent to which a test measures what we actually want to measure. Four types of instrument validity have been frequently discussed in research literature. The first is content validity (or face validity) which seeks to ensure that the measure includes adequate and representative items that represent the concept (Sekaran, 2003). It measures the extent to which the measurement scale reflects what is intended to be measured. According to Emory and Cooper (1991), content validity can be achieved by a careful definition of the research topic and the items included in the measurement scale. They further suggest that using a group of individuals or experts can help in judging how well the instrument meets the standard. Moreover, Bryman and Bell (2007) suggest that content validity might be established by asking other people whether or not the measure is apparently getting at the concept that is under consideration. It has been argued that there is a disagreement among social science researchers regarding the content of many concepts, and it is apparently difficult to develop measures that have agreed validity (De Vaus, 2002).
The second type of instrument validity of a measure is construct validity. According to this type of validity, the researchers are encouraged to deduce hypotheses from a theory that is relevant to the concept (Bryman & Bell, 2007). It is considered to be the most difficult type of validity to be understood, evaluated, and reported. Generally, construct validity is evaluated by tracking the performance of the instrument scale over years in
Dimensions Management Accountants Accounting Students
Question No. Items Alpha Question No. Items Alpha
Moral idealism Section B 1-10 10 .74 Section B 1-10 10 .66 Moral relativism Section B 11-20 10 .79 Section B 11-20 10 .61
Law and code A9 (1-4) 4 .79 - - -
Company interest A9 (5-8) 4 .72 - - -
Social responsibility A9 (9-12) 4 .74 - - -
Personal morality A9 (13-16) 4 .65 - - -
127
varied places and populations (Litwin, 1995; Oppenheim, 2003). To ensure construct validity, it has been recommended to use established constructs or measurement scales and take into account the opinion of experts (De Vaus, 2002).
Concurrent validity is the third type of instrument validity. According to Oppenheim (2003), concurrent validity refers to the extent to which the measurement scale relates to other well-validated measures of the same subject. It can be assessed in terms of the extent to which results obtained from this scale are consistent with the results of other scales that are developed to measure similar objects (Litwin, 1995; Oppenheim, 2003). Predictive validity (the forth type) is a related type of validity which refers to the ability of an instrument scale to predict future performance, events, behaviour, and attitude (Litwin, 1995; Oppenheim, 2003).
Several efforts were made to ensure questionnaire validity. Firstly, an extensive literature review was conducted to define the topic and the purpose of the study. Secondly, several, questions, items and scales applied to different populations and within different settings such as ECQ and EPQ were adopted by this study, thus establishing construct validity (see discussion in section 3.7). According to Sekaran (2003), the development of a valid survey instrument involves drawing upon valid literature, to ensure that any survey questions collected from the literature are based on validated survey instruments. Thirdly, the questionnaire was also passed to friends, several doctoral students and expert, and a pilot study was conducted (see section 3.6.4).