Philosophical assumptions and research designs adopted in a research process have associated strengths and weaknesses (Saunders, Lewis and Thornhill, 2016). The researchers aim to select the tools which generate the best fitting data to assess the research question (Collis and Hussey, 2014). The concepts such as validity and reliability enable to assess the quality of the chosen approach (Saunders, Lewis and Thornhill, 2016) and are examined in the current section.
3.4.1. V ALIDITY
Validity evaluates whether the chosen research type measures what the researcher intends to measure (Gray, 2014). It also assesses whether the chosen methods can provide valid analysis and generalisation of data (Saunders, Lewis and Thornhill, 2016). The literature identifies multiple types of validity, which can be categorised into three broad groups – face validity, internal validity and external validity (Gray, 2014).
Face validity refers to whether the chosen measurement is actually measuring the intended phenomenon (Collis and Hussey, 2014). Although face validity could be a subjective concept, the researcher can undertake a pilot test to assess whether he methods are measuring the intended concept (Hair et al., 2016). Additionally, the researcher can also use published work and the views of other researchers to assess the quality of the used instruments (Gray, 2014).
Internal validity refers to the extent to which the results were observed due to genuine differences between the studied variables as opposed to other influences or biases (Easterby-Smith et al., 2018). Hence, internal validity is high when accurate causal inferences can be made between studied variables (Gray, 2014). Whereas external validity refers to whether the findings can be generalised to other settings (Easterby-Smith et al., 2018). Research setting limits the extent to which studies can be
generalised (Gray, 2014). For example, findings obtained by studying a large company cannot necessarily be generalised to a small business (Easterby-Smith et al., 2018).
Internal and external validity can be depicted as being on opposite ends of a continuum (Sekaran and Bougie, 2016). Internal validity tends to be high for experiments as they are conducted in highly controlled artificial settings using randomly selected and assigned participants (Gray, 2014). However, the usage of artificial environment to obtain precise measurements results in low external validity and ability to generalise the results to other non-artificial settings. Whereas more interpretivist often qualitative studies tend to be conducted in a natural environment which enable to provide better generalisation to other similar real-life environment. However, they result in less control of confounding and extraneous variables, thus generating data with less internal
validity (Collis and Hussey, 2014). These decisions tend to be guided by the purpose of the study. Researchers can put in place further checks to ensure more valid data. For example, with qualitative interviews, participants can be asked to review transcripts or coding approaches of researchers to assure that their views are accurately reflected in the data (Gray, 2014).
Validity of the current study
Face validity of the measures of the current project – visual attention, familiarity and goals is high. The research aimed to assess visual attention of shoppers in a real setting, specifically to investigate how venue familiarity and consumer goals influence visual attention, and how visual attention and choice are associated. Thus, accurately measuring visual attention is important for the current project. As it was noted in the previous section, visual attention was measured using the eye-tracking equipment. The biological structure of the human eye requires humans to move their eyes to enable its visual processing (Dogu and Erkip, 2000). Hence, recording of eye movements
enables to record participant’s visual attention. Furthermore, prior academic studies such as Treistman and Gregg (1979) and Chandon et al. (2008) demonstrated face validity of using eye-tracking to measure visual attention. Although it is possible that respondents change their behaviour when they know they are being observed,
Hendrickson and Ailawadi (2014) demonstrated that the use of eye-tracking equipment does not influence the behaviour of respondents. In the post-purchase questionnaire respondents indicated that they forgot they were wearing the equipment (Hendrickson and Ailawadi, 2014). Furthermore, this method of assessing visual attention has been shown to have a less biased effect on other measures such as choice and brand recall (Higgins, Leinenger and Rayner, 2014).
Additionally, the research aimed to assess how familiarity with the venue influences visual attention of shoppers. The current project measured familiarity with the venue as the number of interactions consumers have had with the venue in a recent time. The measure is consistent with commercial (Mintel, 2019) and academic (Gidlöf et al., 2017) measures of venue familiarity. This approach also follows a basic definition of familiarity that assumes people become more familiar with stimuli or a setting as they encounter it more (Alba and Hutchinson, 1987). Therefore, the face validity of
familiarity measure is high.
Furthermore, the study examined how consumer goal types influence visual attention of shoppers. The study assumed that consumers who know what exact product they want to purchase had a specific goal, whereas those who knew they wanted to purchase a beer but did not know which exact brand they wanted to purchase had an ambiguous goal. A similar approach was used by Wästlund et al. (2015), who identified consumers purchasing a specific jacket or coffee as being part of a specific goal group, whereas consumers purchasing the jacket or coffee they liked were assumed to have an ambiguous goal. This definition is consistent with the stance in the theory that
people with more defined goals have more specific goals than those with less defined goals (Russell et al., 1999).
The aim of the research was to investigate visual behaviour of respondents in a real-life shopping environment in a pub and generalise those findings to similar settings, as opposed to generalising findings from sample to population. Carrying out the
experiment in a real location using consumers who planned to carry out the shopping task studied by the current project increased external validity of the project (Easterby-Smith et al., 2018). However, the usage of real-life research setting is associated with a lower internal validity as the researcher is not able to control potential cofounding variables. The usage of quasi-research strategy also does not rely on random allocation of participants which could lead lower validity as the groups are not necessarily equivalent (Bryman and Bell, 2015). However, in some cases where randomisation is not possible quasi-experiment strategy can be used to reach strong inferences (Easterby-Smith et al., 2018). As current research aimed to examine how consumer familiarity and consumer goals influence visual attention, this approach should provide the most internally valid data. As actual visual behaviour of respondents is objectively recorded, it is believed that first time visitors in a studied pub are likely to exhibit similar behaviour to other first time visitors in other UK pubs thus providing reliable evidence to test the research hypothesis. By collecting data in multiple cities, the researcher aims to provide further support that the conclusions can be generalised to other venues.
3.4.2. R ELIABILITY
Reliability assesses the extent to which the findings are replicable, consistent and bias-free (Sekaran and Bougie, 2016). Reliability is high when similar results are produced in a repeated study (Collis and Hussey, 2014). The concept of reliability is more important in positivist, quantitative studies than in interpretivist, qualitative ones (Bryman and Bell, 2015). Reliability is crucial as it allows to examine whether any systematic errors are present in the methods which could influence the results (Saunders, Lewis and Thornhill, 2016).
The elements of reliability are varied, however the definitions proposed by Bryman and Bell (2015) were chosen. The authors identified three elements that influence
reliability – stability, internal reliability and inter-rater reliability. Stability refers to the researcher obtaining a similar set of results from a respondent on different occasions under similar conditions, using the same methods and techniques (Hair et al., 2016).
Thus, a positive correlation between the measures gathered on separate occasions indicates that reliability is high (Sekaran and Bougie, 2016).
Internal reliability aims to assess to what extent all items of a scale or measurement study the same concept (Howitt and Cramer, 2017). The concept is crucial for questionnaire data collection type and requires the researcher to assess whether multiple-indicator measures within the instrument are consistent using Cronbach’s alpha test (Gray, 2014). A higher score implies a higher internal reliability.
Lastly, inter-rater reliability refers to how consistent and accurate are the judgements made by multiple people (Bryman and Bell, 2015). The measure refers to coding of observations, assigning scores to performance or categorising data which is performed by a number of coders. Inter-rater reliability is high when the scores of multiple
individuals performing the coding are positively correlated (Gray, 2014).
Reliability of the current study
The research often requires its measurements to be stable. However, Howitt and Cramer (2017) noted that some psychological characteristics such as attention, happiness and alertness are not stable over time and could vary depending on the state of a person when they are taking part in the research. Therefore, the actual elements of the scene participants noticed in the current study could change if they were retested. However, the current project did not aim to find set patterns of visual behaviour. Instead, the aim was to compare the means between the groups of
respondents – those with different levels of familiarity, goals or chosen and non-chosen products. Therefore, although the actual elements noticed in the scene could differ on repeated test, it is assumed that the volume of visual attention will remain the same.
Such as regular visitors on average will look at the scene in a similar manner on a repeated test and pay less attention to the products, even though the actual products of the scene they notice could be different. Additionally, the study recruited a large
sample of respondents, which enabled to average out more extreme cases. Therefore, resulting in high stability.
Internal reliability is a concept which is important to the research utilising
questionnaires or some types of interview. This project gathered actual behaviour data, rather than relied on the responses of participants. Thus resulting in internal reliability measure being not applicable to the current project.
Finally, the collected eye-tracking data requires manual coding of fixations to identify which area of interest a participant looks at (further discussed in the following section).
Some research projects employ multiple researchers to undertake the task and then correlate the classification results to order to assess the extent to which the coders agree. However, it was not possible within the current project to have multiple coders, therefore all coding was conducted by the researcher. This could result in biases and low inter-rater reliability. In order to address this issue, a clear conservative rule was followed – a fixation was coded to the area of interest if it was overlaid by more than a half of a fixation. It should be noted that in most cases the coding was straightforward, but in borderline cases extra diligence was used. Additionally, all recordings were checked at least twice in order to correct any human error. Therefore, although only a single person coded the data, the adopted approach should result in high reliability of categorisation.