Student samples have often been used in online shopping research (e.g., Balabanis and Reynolds, 2001; Kim et al., 2006; Li et al., 2002; 2003). This is justifiable as students are computer-literate, having few problems in using new technology. Students are potential consumers of electrical goods, having actual online experiences (Yoo and Donthu, 2001), being homogeneous in nature (Jahng, Jain and Ramamurthy, 2000; Calder et al., 1981), are topic relevant (Ferber, 1977) and their technological advances and innovativeness qualify them as a proper sample for online shopping research (Yoo and Donthu, 2001, p. 2).
5.8.1 Appropriate Number of Participants
The appropriate number of participants in a sample size is a tricky and complex decision. Hence, this study decided to explain the most commonly used techniques in determining the proper number of the sample size. First, rules of thumb; some scholars follow a rule of thumb in determining the proper sample
123 size. For example, Roscoe (1975) suggests four rules of thumb to decide the proper sample size (n).
(i) The number of participants should be larger than 30 and the less than 500.
(ii) If researchers have more than one group (e.g., male or female), Roscoe (1975) recommends researchers to employ more than 30 participants for each group.
(iii) In the case of using multivariate analyses, Roscoe (1975) advises researchers to have a sample size that is larger, at least 10 times or more, than the number of variables used in the analysis. Furthermore, other scholars such as Stevens (1996) suggest having 15 cases per construct to calculate the proper sample size. Stevens (1996) posit that 15 cases per construct are sufficient to get trustworthy results from the multivariate analysis. In turn, Bentler and Chou (1987) advise researchers to determine the sample size based on number of parameters. For example, Bentler and Chou (1987) posit that if the data is normally distributed, then at least 5 cases per parameter is sufficient.
(iv) If the researcher is conducting a simple laboratory experiment where some conditions are controlled, then the appropriate sample size should be between 10-20 participants (Roscoe, 1975). Other scholars such as Krejcie and Morgan (1970) propos a table to determine the proper number of a sample size (S) derived from a population (N).
The second technique that scholars use in determining the adequate number of a sample size depends on the data analysis processes or techniques (Hair et al., 2006). This study explains the five considerations that Hair et al. (2006) recommend to determine the proper sample size when using Structural Equation Modelling (SEM) techniques. First, according to Hair et al. (2006), if the distribution of the data deviates from the assumption of multivariate normality, then 15 respondents for each parameter is an acceptable number to minimise the problem of deviation from normality. Second, the sample size should range from 150 to 400 respondents, if the estimation technique is to be used. In other words, SEM is based on the maximum likelihood estimation (MLE) method, which
124 gives adequate results if the sample size ranged from 150 to 400 respondents. Hair and colleagues (2006) explain that if the sample size exceeds 400, then the MLE method becomes more sensitive and results of the goodness-of-fit measures become poorer. Third, model complexity; this consideration relates to the number of constructs used in the analysis. In other words, the more constructs a model has, the more parameters should be used in the analysis and as a result the more sample size is needed to conduct the analysis. Moreover, Hair et al. (2006) assert that if a researcher is using a multi-group analysis, then, an adequate sample for each group is required. Fourth, missing data, Hair et al. (2006) posit that the more missing data research has, the greater sample size a study needs. Fifth, Hair et al. (2006, p. 741) advise researchers to consider communalities (average error of variance of indicators, and represent the average amount of variation among the measured/indicator variables explained by the measurement model) before deciding the proper sample size. Communalities should be above .5 (equals .7 standardised loading estimates); otherwise the study requires more sample size. For instance, Hair et al. (2006) assert that if any communality is between .45-.55, or the model has constructs with fewer than three items, then the sample size should be above 200. On the other hand, if the communalities are lower than .45 then the minimum sample size should be 300 or more.
Based on the above discussion and since this research is based on using ANOVA and SEM, this study decides to have a sample size of 300 plus in all the one-way repeated measure ANOVA (Stages 1 and 2) and the fifth stage to achieve trustworthy results. For the two-way repeated measures ANOVA (Stages 3 and 4) this study follows one of the rules of thumb that Roscoe (1975) suggests (for conducting an experiment) from 10 to 20 participants.
5.8.2 Sampling Techniques
This research employs a non-random sampling technique, namely, a convenience sampling technique (based on employing participants who are easily accessible, according to McDaniel and Gates, 2006). Bryman and Bell (2007, p. 198) posit that “in the field of business and management, convenience samples are very common and indeed are more prominent than are samples based on probability
125 sampling”. This technique has been chosen since this study has asked the permission of the participants before doing the experiments. Once the participants agree to do the experiments the study precedes, otherwise the study stops and seeks other participants.
5.8.3 Questionnaire Design
Three forms of questionnaires were developed for this thesis. The first was used for testing the study experimental manipulation checks (pre-tests). The second tested the progressive levels of the various constructs using perceived 3D authenticity and behavioural intention as the dependent variables (Stages 3 and 4, see Appendix 7). The third questionnaire was the main questionnaire, used as the main tool for collecting data to test the difference between 3D telepresence and 3D authenticity (Stage 1), the difference between 2D and 3D virtual experience (Stage 2), and to test the online S-O-R framework (Stage 5). The third questionnaire consisted of three parts (see Appendix 6):
Part one:
Asked the participants about their ability to control the 3D product visualisation, 3D animated colours, 3D authenticity, 3D telepresence, 3D hedonic and utilitarian values, and 3D behavioural intention.
Part two:
Asked the participants about 2D hedonic and utilitarian values.
Part three:
Contained twelve questions, asking participants about their use of the internet for e-shopping, number of times that they have bought a laptop online, number of years experience of using the internet, frequency of using the internet for surfing e-retailers, gender, marital status, age, level of education, annual income, the country that best describes participants culture and the school they belong to.
126 5.8.4 Justification for Using Five-Point Likert Scale
This study chooses a 5-point Likert scale with a mid point of neutral to collect the data. A 5-point Likert scale is commonly used and relatively easy to collect the data from respondents using a survey (Preston and Colman, 2000; Sekaran 2000). Notwithstanding, the decision to choose the Likert scale points is a matter of debate (Cox, 1986). For example, some authors prefer using scales of seven, nine and sometimes eleven points respectively, over scales of two, three or four points. The former increase reliability and validity of the research area and the latter generate lower internal consistency, validity and discriminating power (Preston and Colman, 2000). In turn, Hartely and Mclean (2006) find that using a scale of five points often increases response rates of any study up to 90 per cent. Moreover, Dawes’ (2002) empirical research reveals that the reliability and validity slightly changed when using a seven-point Likert scale in comparison to using a 5-point Likert scale. Dawes (2002) posits that increasing reliability and validity of the Likert scale was not noticeable when she increased the responses from scales of seven to nine, or even from seven to ten. Dawes (2002) asserts that an eleven-point Likert scale generated the same mean as a 5-point Likert scale. Moreover, kurtosis and skewness of eleven-point and 5-point Likert scales showed some unsystematic differences. Neumann (1983) posits that using a 5- point and a seven-point Likert scale gives similar results (i.e., in regard to means and correlation coefficients). Moreover, the author recommends researchers to use a 5-point Likert scale instead of using a seven-point scale, especially when attitudinal research is being carried out.