The characteristic SIS 3.0 include 59-items in eight domains (strength, hand function, mobility, physical and instrumental activities of daily living (ADL/IADL), memory and thinking, communication, emotion and social participation) (Carod-Artal et al., 2008;
Carod-Artal et al., 2009; White et al., 2007). Apart from covering all measurable life domains and functioning, a good stroke-specific measure of HRQOL should have the essential and rigorous psychometric properties including validity, acceptability, responsiveness, sensibility, proxy suitable, sensibility and minimal clinically important difference (MCID) (Barak & Duncan, 2006; Buck et al., 2000).
Validity
Like any outcome measure, verification of the validity is essential for QOL measures in order to guarantee self-reliance of their scientific robustness (Buck et al., 2000). It is the ability of an instrument to measure what it is proposed to measure (Barak & Duncan, 2006; Buck et al., 2000; Owolabi, 2010). The validity of an assessment tool is enhanced by the absence of floor and ceiling effect (Owolabi, 2010).
48 Content validity is defined as the extent to which a measure represents all domains of interest in a given construct (Buck et al., 2000; Owolabi, 2010). Content validity requires experts on the subject matter to evaluate whether test items assess defined content and require more statistical tests than does the assessment of face validity (Barak & Duncan, 2006).
Construct validity refers to the specification of the factors that account for the discrepancy or inconsistency in the intended measures and the theoretical association between them. A hypothesis regarding the probable strength and direction of the possible relationship is stated (Owolabi, 2010). Validity is supported when the correlations are in agreement with prior hypotheses (Owolabi, 2010). Other forms of construct validity include convergent (concurrent) validity which is obtained when different measures of similar fashion are rationally associated and extremely correlated.
On the other hand, discriminant validity, which is another form of construct validity, is manifested when dissimilar measures/domains are not as eminently correlated (Barak &
Duncan, 2006; Owolabi, 2010).
Criterion validity is defined as the performance of the instrument as compared to the existing gold standard or outcome that the measure was intended to assess (Barak &
Duncan, 2006; Owolabi, 2010). Predictive validity (a form of criterion validity) and is referred to as the extent to which a test can tell about how well an individual will do in a later position (Barak & Duncan, 2006).
Reliability
This is the extent to which a score is free of random error such that measurements for the same individual on independent occasions or by different observers produce comparable or approximate results (Barak & Duncan, 2006; Buck et al., 2000).
Internal consistency reliability is the most commonly used estimate of the reliability of an outcome measure. It is the average degree of association among the items on a test
49 (Barak & Duncan, 2006). Cronbach’s coefficient ‘alpha’ is used to evaluate the extent of equivalence and association between responses to items/questions tapping the same concept (Owolabi, 2010). Excellent internal consistency is reported at 0.80, adequate is 0.70– 0.79 and poor is 0.70 (Barak & Duncan, 2006). Cronbach’s coefficient alpha is directly proportional to the number of and the correlation between items tapping the same concept. An acceptable level (Nunnaly’s) of alpha has been defined as 0.70 or more (Owolabi, 2010).
Test-retest reliability is the measure of correspondence between scores achieved by the same person at two different times (Owolabi, 2010). A suggested minimum test–retest reliability of 0.90 is proposed to evaluate the on-going progress of an individual in a treatment condition (Barak & Duncan, 2006). The problem is in ascertaining whether observed changes are due to chance or improvement/deterioration over time (Barak &
Duncan, 2006; Owolabi, 2010).
Inter-rater reliability pertains to the extent of correlation obtained between two or more observers that assess the same respondent (Barak & Duncan, 2006; Owolabi, 2010).
Generally, 80% agreement between observers is the least required (Barak & Duncan, 2006).
Responsiveness
Responsiveness of a QOL measure is essential when ascertaining the effect of a treatment on the patient’s health (therapeutic effect) (Barak & Duncan, 2006; Houlden et al., 2006). It’s the ability of HRQOL measure to reveal even small differences within an individual over time. Responsiveness is also referred to as sensitivity to changes within patients over time (Barak & Duncan, 2006; Houlden et al., 2006). Disease-specific measures can be more responsive than generic measures, resulting from their ability to measure domains of particular interest in a person with the condition, thereby allowing for detection of small changes. Responsiveness is most commonly evaluated
50 through correlation with other scores, effect sizes, standardized response means, relative efficiency and sensitivity and specificity of change scores (Barak & Duncan, 2006;
Owolabi, 2010). Responsiveness of a measure can be established by using the paired t-test statistic for within-subject changes. It can also be calculated as effect size, which is the change in mean score from baseline to follow - up divided by the standard deviation of baseline scores (Owolabi, 2010).
Acceptability
For this study and as similarly reported by Buck et al. (2000), acceptability is determined by pretesting with patients in terms of wording, response options, and general layout and suggested by the high response rate (Buck et al., 2000; Owolabi, 2010). Instruments with brief completion time, simplicity, brevity and small number of items enable acceptability but may concede content validity, precision and responsiveness (Buck et al., 2000; Owolabi, 2010).
Mode of Administration
Self-administered measures tend to be less resource intensive, but may be especially difficult if not impossible task for patients with cognitive or language difficulties that may affect concentration and comprehension. Equally, applying interviewer administered questionnaires may be difficult to some stroke patients with speech problems (dysphasia) and incapable to respond to an interview (Buck et al., 2000;
Owolabi, 2010). Depending on the practical solution, it’s therefore essential to determine whether QOL measure can be either self, interviewer-administered or both (Barak & Duncan, 2006; Buck et al., 2000).
Proxy suitability
Use of proxies is applicable when there will be agreeable correspondence of responses from patients, their close relation or significant other (in answering questions as he or she believes what the patient would). It is essential to utilize proxy owing to the
51 difficulties that the stroke patients may encounter when communicating or understanding research questions (Barak & Duncan, 2006; Buck et al., 2000; Owolabi, 2010). In order to limit selection bias and where crucial, the use of proxies may be favoured than omitting more severe cases from trials, particularly because such individuals may probably have a considerably decreased QOL (Buck et al., 2000;
Owolabi, 2010).
Sensibility
Sensibility is one of the major determinants of success or failure of clinical measures.
The chosen outcome measures must not be cumbersome to the patient, and must also correspond to the goals of the intervention and the study (Barak & Duncan, 2006;
Houlden et al., 2006). It’s also referred to as the overall appropriateness, importance, and ease of use of an instrument (Barak & Duncan, 2006; Houlden et al., 2006).