Concerning these latter two points there is little supporting statistical evidence. Point 4, for example, refers to bias in construct validity. This is said to exist
''..when a test is shown to measure different hypothetical traits (psychological constructs) for one group than for another." (Reynolds 1983 p.245)
and is most commonly assessed by means of factor analysis. The conclusion of factorial similarity across cultures has been reached for a number of different test batteries by a number of different techniques. Those investigated include the WISC-R (Reynolds 1983), the Simultaneous-Successive test battery (Das et al 1979) and the Kaufman Assessment Battery for Children (Giordani et al 1996). Consequently the evidence to date is that psychological tests function in essentially the same manner across different cultures.
Point 5 refers to criterion related, or predictive, validity. To assess this an investigation needs to be made of the regression equation between a test and its criterion (Jensen 1980). Such that if :
"for any group (assessed) there is a significant difference (from other groups) in the slopes or the intercepts, or in the standard errors of estimates of the regression lines, " (Kline 1993 p. 166)
then the test is, by a statistical definition, biased.
Zeidner (1987) tested the hypothesis that cultural bias compromises the predictive validity of formal tests using the regression equation criterion. The Scholastic Aptitude Test (SAT) was applied to university students of different ethnic groups in Israel. Initial investigations of construct validity had shown a common construct underlying the test between the different groups. Performance on the SAT was then compared to later academic success using regression analysis. There was no evidence of slope bias. There was some evidence of intercept bias, but this related to a tendency to under-predict the achievement of two unexpected groups, ethnic Israeli's (the majority group!) and females. It was concluded therefore that where the same criterion was used (school or academic achievement) then there was little statistical evidence for the claim of systematic bias leading to under-prediction of scholastic achievement amongst ethnic minorities (Zeidner 1987). This same conclusion is reached by Reynolds (1983), following a number of cross-racial comparisons of the predictive validity of the WISC-R in both school-age and pre-school children. Again where differences did occur, they were found to be operating in the opposite direction to that predicted by claims of bias, in under estimating the performance of majority groups (Reynolds 1983).
In addition to investigating the predictive validity in this way Kline suggests that assessments should also be made of test-re-test reliability and internal consistency, in order to evaluate the appropriateness of any test for use in a different culture. But these need to be carried out on a test by test and context by context basis. The former requires correlational evidence of the reliability of scores over time (Kline 1993). The latter can be assessed by carrying out an item analysis which would then identify those items with low or inconsistent scores and should lead to the removal of those items which show such bias (Drasgow 1987, Kumar et al 1993). If these criteria are met then there is little statistical
support for the claim of cultural bias (Vernon 1967, Cole & Scribner 1974, Jensen 1980, Sattler 1982, Scarr 1984, Taylor 1991).
Despite supporting the argument of a lack of statistical evidence for test bias Zeidner (1987) suggests that there are other, important, ways that test bias is shown. Cited amongst these are inappropriate outcome criteria and an over interpretation of test results. The conclusion to the issue of the existence of bias in mental testing appears to rest on the definition of what constitutes bias, and the interpretation of differences in performance found between different groups. If the information is used to demonstrate or predict achievement potential in vocational activities, or functioning in the community at large, then there are distinct limitations to the application of the traditional static assessment procedures cited above (Brown & French 1979, Rogoff et al 1984). If, on the other hand, the information is to be used to understand/predict achievement and functioning in school, then such tests retain value, although the unfamiliarity of the content and procedures may seriously reduce the tests sensitivity (Serpell
1979).
Culture Free Assessments
One approach to overcoming potential test bias has been to produce culture fair or culture free tests. Examples of tests for which these claims have been made include: Raven's Progressive Matrices; Cattell's Culture Fair Test; the Porteous Maze test; and the Test of Non-Verbal Intelligence (TONI) (Cummins 1984, Parmar 1989, Kline 1993). There are two major weaknesses with these tests. The first relates to the limited focus of the mental functions sampled, as these assessment tools tend to be non-verbal, focusing upon visual-spatial ability, and analogous reasoning (Cummins 1984, Boivin et al 1995). The second major criticism relates to the concept of Culture Free/Fairness. What this implies is the assessment of cognitive functions independent of experience, and as, to quote Cole:
"it is extremely doubtful if we can discover any mental processes independent of experience". (Cole cited in Cummins 1984, p 191)
One of the tests listed above, Raven's Progressive Matrices (Raven cited in Kline 1993), requires the "observation o f perceptual detail and some aivareness
o f sequence." (Sigman et al 1989b p. 1465). The child is presented with a series of patterns (matrices) from which one is missing. S/he then has to select the missing pattern from options provided. In addition to the weakness of only having a limited focus to the functions being assessed, Kline (1993) also criticises Raven's Progressive Matrices on two further counts. Firstly he suggests that the test is not sufficiently finely graded, with too few items for each age group to give a truly sensitive measure. Secondly he questions its use with children unfamiliar with the type of material used in the test, claiming that it is likely to provide an underestimate of understanding and ability.
Contributing to the potential underestimation of ability suggested by Kline is likely to be specific problems associated with the visual nature of the task. In tasks of visuo-perceptual discrimination many differences between cultures have been found, particularly where right/wrong answers are required (Gregory 1966). Evidence presented by Gregory suggests that different features of visually presented material are more salient to individuals from different cultures. It is not that things are "seen" differently, rather that different features are chosen on which to base interpretations and decisions, and thus errors may be made due to factors other than the inability to perceive or to reason (Wober 1975, Serpell 1979). Wober's review of investigations in Africa questions the assumption of many testers that visually presented materials are more appropriate in non literate societies. But the reported difficulties with visual material must not be overestimated either. A cross-cultural investigation of the utilisation of pictorial information compared the pattern of performance of African, Indian, and European secondary school students in relation to information presented either pictorially or in written form (Jahoda et al 1976). No specific weaknesses were found in the performance of African students associated with information displayed pictorially (Jahoda et al 1976). In fact the presence of pictorially presented information was found to enhance their performance on recall of the material presented in written form. The authors concluded, however, that these
results related to drawings of single objects only. The reported difficulties of African children with more complex pictures, and with spatial relationships within pictures (Hudson cited in Jahoda et al 1976), is likely to still be of relevance to performance on tasks such as Raven's Matrices.
This assessment has previously been used in Kenya, though not with the specific cultural group currently being studied. It has been used both for clinical (Lukwago, personal communication), and research (Sigman et al 1989b) purposes. In the Kenyan clinical setting its main value is held to be the observational measures provided by the structured assessment format, rather than any actual score elicited (Lukwago, personal communication). In the research setting Raven's Matrices have contributed to a composite score, which also included a vocabulary comprehension measure. It was the composite score which was reported to significantly correlate with school achievement (Sigman et al 1989b).
An assessment of the validity of another non-verbal test, the Test of Non-Verbal Intelligence (TONI) (Brown et al cited in Parmar 1989), illustrates another limitation of a non-verbal approach to test design (Parmar 1989). Performance of ninety- three 7 - 9 year old school students in India on the TONI was compared to their performance on a more traditional intelligence test, the Wechsler Intelligence Scale for Children (Wechsler 1974). Results showed that performance on the TONI was in fact the inferior of the two. This led to the conclusion that the non-verbal format was actually disadvantageous, and that this was likely to be due to the more abstract nature of the tasks which tend to be included in non-verbal tests (Parmar 1989).
It can be concluded from these examples that cultural influences upon performance in assessment tasks cannot be avoided. Attempts to do so are likely to narrow the range of cognitive functions being assessed, and may even exaggerate the affect of particular weaknesses. If the aim is to elicit the best possible performance in a broad spectrum of cognitive functions, then account
will need to be taken of specific cultural influences upon development. Information as to how this will impact upon cognitive functions, and test session behaviour, should then be incorporated into the construction of the assessment battery.
“The Developmental Niche"
Like Vygotsky and Luria, Super and Harkness (1986) acknowledge the vital role of the environment in understanding the development of psychological processes. They provide an approach to conceptualising the relationship between specific influences and resulting development which they refer to as the
"Development Niche", and state that;
"Although not a formal theory ...,the developmental niche provides a framework for examining the effects of cultural features on child rearing in interaction with general developmental parameters." (Super & Harkness 1986 p.546).
It aims to provide a:
" theoretical framework for studying the cultural regulation of the micro environment of the child. " (Super & Harkness 1986 p.552)
It describes three major sub-systems which operate in conjunction with each other, and to an extent overlap, to define the specific environmental influences acting upon the child. These are: a) The physical and social setting in which the child lives; b) Culturally regulated customs of child care and child rearing; and c) The psychology of the caretakers. The consistency of the experiences provided by each sub-system, and the overall framework they form, is said to determine the social, affective and cognitive rules adopted by the child.
a: The physical and social setting in which the child lives
This refers to the range of experiences available in the immediate physical environment which determine with which selection of materials and repertoires of behaviour the child is familiar. One example of the specificity of development which might then arise comes from the analysis of the observed precocity of rural African children on measures of early motor development (Dasen 1988). This
precocity has been linked to a combination of environmental demands and specific customs of child care, (dirt floors motivate the child to avoid crawling, and the parent to actively encourage walking skills). This interpretation, and the context specific nature of skill development, is supported by a closer inspection of the early maturation data (Gerber & Dean cited in Dasen 1988, and Super cited in Super and Harkness 1986); where it is found that the superior performances are largely accounted for by measures of standing and walking (Super & Harkness 1986; Dasen 1988). The specificity of the superiority in motor development has also been observed in children being screened for school readiness (Mitchell 1995). Rural African children were found to gain higher scores on gross motor tasks. This superiority did not extend to fine motor tasks, particularly those requiring the use of tools such as scissors and pencils which are in more common use in urban homes.
The question is then raised as to how specialised abilities can become. Cross cultural research which has investigated Piagetian stages of conceptual development has claimed a universal sequence in the development of cognitive processes, but with differences found in the ages at which the stages appear (Kiminyo 1977, Dasen 1988, Laboratory of Comparative Human Cognition 1979). The similarities found in the performance of subjects from different cultures have been taken as support of the existence of universal structures of cognitive thought. The differences have been related to differences in "cultural and educational transm ission" (Piaget 1966), which appear as a maturational lag. However this explanation presupposes a uniformity in developmental end points, whilst Vygotskian theory argues that the differences between cultures should lead to qualitatively different end points in development (Vygotsky 1934/1986). This particular discussion would seem to have a relevance in establishing the predictive validity of assessment tools used. Its resolution relies on further research, for as yet cross-cultural research has not provided definitive evidence to support either argument (Dasen 1977, 1988).
What such research has been able to establish so far is summarised in the following quotation from Rogoff et al:
"A major contribution that cross-cultural research and theorising has made to developmental psychology is the notion that the meaningfulness of the materials, demands, goals and social situation of an activity channels an individuals’s performance on a task. ” (Rogoff et al 1984 p.564).
Indeed changes in the following areas have all been demonstrated to bring about a significantly improved level of performance in children from cultures different to that for which the test was originally devised: language of instruction (Kamara & Easley 1977; Serpell 1979; Cazden & John 1971;) materials (Serpell 1979); structure of the task (Chione et al 1993); and the structure of the testing situation (Cole & Scribner 1974). This evidence would seem to argue for careful attention to be paid to the content and structure of the tasks used in order to elicit examples of performance which are truer samples of underlying levels of competence (Laboratory of Comparative Human Cognition 1979).
A more detailed example of this issue is provided by the work of Serpell investigating cross-cultural similarities and differences in perceptual skills (Serpell 1979). Observation of children's real life activities determined which materials were most familiar to different groups of children. These were: wire in Zambian children,(boys in particular); paper and pencil in British children; with equal familiarity with clay amongst both gender and cultural groups. The children were then asked to copy shapes and designs using one of the these three media. The pattern of relative performance on these formal pattern reproduction tasks matched that which might be predicted by this pattern of familiarity. Where experience was equal, so too were the levels of performance measured. In addition Serpell suggests that familiarity might also exert an indirect influence on observed skill levels, through its influence upon motivation to attempt the task. The reticence observed in the group of British school children to even attempt the wire modelling task is cited in evidence of this (Serpell 1979). It was also suggested that failure on tasks attempting to measure perceptual skills could be attributable to a misunderstanding of the task demand as to what constitutes a
good copy. To the assessor the instruction "the same as" has the goal of a carbon copy. But the child may have the different goal of producing merely a similar figure (Serpell 1979). As the source of failure in such an instance is conceptual rather than perceptual, the task cannot be said to be a valid measure of visual perception (Wober 1975).
Dynamic Assessment
The lack of familiarity with all aspects of the testing procedure is central to many of the explanations given for the lower performance levels observed on a whole range of assessment tasks in children from poor, rural, and non-white homes (Brislin et al 1973, Miller-Jones 1989; Dasen 1988; Laboratory of Comparative Human Cognition 1979). This lack of familiarity hypothesis appears to be supported by evidence cited in Brislin et al (1973), that both practice and prompts have been found to boost performance levels in African children previously unfamiliar with standardised testing techniques by between 7 and 22%. The largest improvement being found for subjects receiving combination of practice and prompts.
Other evidence that test instruction not only boosts performance, but also increases the predictive power of assessments has come from the development of dynamic approaches to assessment (Feuerstein et al 1986, 1987, Lidz & Thomas 1987, Campione 1989). Dynamic assessment defines intelligence as the ability to learn and profit from experience, and the role of training and instruction in this approach to assessment has theoretical as well as a pragmatic value. The roots of dynamic assessment can be traced back to Vygotsky's zone of proximal development, described in an earlier chapter (Vygotsky 1934/1986). The operationalization of the measurement of this zone has led to different assessment procedures, most incorporating a test-teach-test format. One of the most influential pioneers of the dynamic approach to assessment is Feuerstein, who worked for many years with immigrants to Israel, and with groups of children in North Africa. He defined intelligence as:
"the capacity of an individual to use previouslv acquired experiences (emphasis original) to adjust to new situations. The two factors stressed in this definition are the capacity of the individual to be modified by learning and the ability of the individual to use whatever modification has occurred for future adjustments."(Feuers\e\n cited in Cummins 1984 p.198).
The assessment procedure Feuerstein devised (the Learning Potential Assessment Device, LPAD) is aimed at assessing an individual's potential for being 'modified', that is at identifying the deficient cognitive strategies and assessing the individual's ability to adopt new strategies. It is stressed that this should occur in an atmosphere designed to foster motivation and develop positive attitudes towards the process itself. The LPAD is designed for children 8 years and over, to identify weaknesses in mental functioning, and to feed straight into an individualised remedial programme (Instrumental Enrichment, I.E.). It is consequently targeting those children who have been identified as having difficulties with mainstream education. The primary goal is therefore prescriptive rather than predictive, with attempts being made to engender a structural change in the child's cognitive functioning. The application of the LPAD and IE has been reviewed by Campione (1989), and Macgregor (1992). The evidence in support of the prescriptive goal is equivocal, but results certainly support the contention that pre-test assessments underestimate intellectual skills in a significant proportion of poor school achievers.
This contention is also supported by the application of other models of dynamic assessment (Budoff 1987, Campione 1989, Carlson & WiedI 1992). Budoff and colleagues (Budoff 1987) measure learning potential by assessing the gain score between pre test and post test. In a study designed to compare the effects of practice and instruction on gain score, children were divided according to their IQ scores into bright, average and poor. The first group benefited most from simple practice, whilst the largest gain scores in the second two groups was achieved