4.2. COMPROBACIÓN DE LA HIPÓTESIS 125
4.2.4. Tabla de resumen de la variable dependiente y sus puntajes totales 137
understanding of ingredients and some attempt to achieve a quality finish,
though the content may be high in sugar and fat.
Te 3 - level 4 - Food - A compact bar has been made which achieves a quality finish and shows that thought has gone into providing energy in a form that would be palatable and portable, though not necessarily healthy, for the user.
Te 3 - level 5 - Food - A plan for making has been provided and the finished bar,
containing some protein, fat, vitamins and minerals, as well as
carbohydrates, provides a compact solution. There is an attempt to enable the user to divide the bar into smaller pieces, using divisions which remain after the bar has cooled or set.
Progression in the assessment criteria
Finally the assessment criteria at each level for each attainment target were reviewed across the tasks to ensure comparability of demand. If modifications were made the new criteria were re-examined to ensure that progress, within the task, was not affected. An example of a cross-task criteria is provided below.
Comparability of the assessment criteria
Te 3 - level 3 - Construction materials
A clamp has been made which works satisfactorily. Materials and equipment have been chosen such that the clamp has been made to a satisfactory level of accuracy and quality which enables it to be used.
Te 3 - level 3 - Control
A beacon board has been made which works reliably. Materials and equipment have been chosen and used, such that the board has been made to a
satisfactory level of accuracy and quality.
Te 3 - level 3 - Food
A compact bar has been made which shows a basic understanding of
ingredients and some attempt to achieve a quality finish, though the content may be high in sugar and fat.
(From level 3 onwards, a bar should be defined as something which maintains its shape and in which all the contents adhere together.)
In the trial teachers commented favourably on the assessment procedure. Levelling was commonly felt to provide a fairer assessment of each pupil’s capability. It was also far more manageable than identifying the statements of attainment which should have been evidenced.
R esearch and Developm ent of the assessm ent instrum ents for tests
Issues relating to the statements
The decision to introduce pencil and paper tests was a political one. Politically it was justified by the teacher evaluation of the 1991 pilot (KS1) which stated that the assessment procedures were unmanageable. This decision was not based on the suitability of this subject for this style of assessment. In design and technology the statements of attainment which these tests sought to assess were not designed as assessment criteria for pencil and paper testing. The ambiguity, lack of clarity and clear progression make the statements difficult to use as assessment tools in any assessment context. These issues are even more apparent in the context of pencil and paper tests. The statements describe attributes of what is essentially a practical activity. The ability of fourteen year olds to address such statements in a theoretical context, drawing on their experience of design and technology was from the outset open to question. With experience, guidance and planned teaching, pupils’ capacity to answer tests of this type will improve. However, this will only be achieved at the expense (in terms of time) of the unique practical experience which this subject offers.
August 1991 - August 1992 * •
The initial research was based on a model which sought to assess all four attainment targets. But following a decision of the Secretary of State both trials
- held in February and March 1992 - and the Summer Pilot assessed only
attainment targets 1 and 4 via the test. Although the Autumn trialling was therefore superseded by a different model, some important conclusions were reached on the basis of its outcome. The first test papers assessed performance by outcome, questions were general and teachers “levelled” a pupil’s response by applying, “what to look for” exemplification based on a band of specified statements. It was decided not to proceed with this approach for the following reasons:
• the questions developed for this approach needed to be broad and general in nature to accommodate the wide range of statements and levels which each test covered;
• pupil responses were bland and uniform, which resulted in a lack of clear differentiation - teachers found it difficult to discriminate confidently between answers;
• assessments made by teachers were extremely varied and inconsistent. The statements of attainment, because of their lack of clarity and clear progression, are open to wide interpretation. These characteristics are particularly extreme when teachers are attempting to determine the worth of a response in relation to four or five levels;
• as only a single statement could be assessed at each level the evidence on which levels were being awarded was superficial (this is dependent on the number of statements at a level in an attainment target - variation from 1 to 6). It was concluded that questions needed to be far more focused and would,
therefore, have to assess specific statements. This would allow assessment criteria to be more focused. Consequently all subsequent trials of pencil and paper tests were based on the principle of differentiation by task. A rationale for an
assessment procedure dealing with only these two attainment targets was developed. This was based on the following criteria:
• the test should be common to all pupils regardless of the practical task undertaken;
• the test should draw on the experience of the practical task; • the test should also assess the programme of study;
• the two attainment targets should be given equal weighting in relation to aspects such as, number of statements assessed and time;
• the test should be as coherent and logical as the specification allowed. The following model, for the 1992 Summer Pilot, was developed to meet these criteria.
section pupil context assessment
Section A review the long task Te 4
Section B identifying needs and opportunities in an extension of the long task
Te 1
Section C evaluate the work of others in response to the context extension in section B
Te 4
As only two attainment targets were being assessed by this model, the number of statements addressed at each level was increased by 100%. This provided far more evidence of achievement and offered greater confidence in the resulting assessments. Section A assesses one statement at each level in relation to Te 4, whilst Section C assesses another statement (at some levels the same statement is assessed, e.g. Te 4 level 7). Section B assesses two statements at each level. The pattern of the three tests is shown on the diagram overpage. Appendix 4.8, p. 312, shows the test instructions and an example in relation to one AT.
Selection of statements of attainment
The selection of statements to be assessed via the test had to ensure that coverage was sufficient to be legally valid. These were selected on the basis that,
a tt a in m e n t ta rg e t fo u r a tt a in m e n t ta rg e t o n e a tt a in m e n t ta rg e t fo u r 114
T
h
e
p
a
tt
e
rn
o
f
th
e
t
h
re
e
d
if
fe
re
n
ti
a
te
d
t
e
s
ts
i
n
d
e
s
ig
n
a
n
d
t
e
c
h
n
o
lo
g
y
at each level one of the following two cases applied:
• If two or fewer statements occur at a level both were selected; (11 out of 20 possible cases - 4 levels in Te 1 and 7 levels in Te 4.)
• If more than two statements occur at a level the following criteria were applied: (9 out of 20 possible cases - 6 levels in Te 1 and 3 levels in Te 4.)
- can the statement be tested by a pencil and paper test?
- does the statement allow a logical question to be asked within the sequence? - does the statement allow a different issue to be addressed within the test
band(s) in which it falls?
In reality this resulted in a situation which gave no scope for choice in selecting statements. Indeed in case 1, statements might have been rejected if the case 2 criteria had been applied. It was impossible, in a test of this nature, to assess fully all aspects of each statement. The questions sought to probe aspects of a statement so that a final decision/legal point of arbitration in each case should be the statement.
In addition to the statement the mark scheme provided, for teachers, “what to look for” in marking a question. This information was based firmly on the statement being addressed. In reality the “what to look for” became the assessment criteria which were used by teachers. Consequently the validity of the question’s ability to probe the statement became central to the research. Examples of possible responses were also provided which were generally seen as useful, but did create issues of interpretation. As is often the case with examples, some teachers, quite wrongly, believed they were the only correct answers. The nature of the questions, mainly discursive, meant that professional judgement had to be exercised in order to reach an assessment decision. An example of a, “what to look for" is given below.
Two valid reasons are given for the choice of materials. Answers may refer to visual properties such as colour or pattern, or the effect of combining materials. A personal statement such as, “because I like them* is not adequate at this level. The use of the word “valid” created assessment problems as, like the SoAs , it relied on teachers interpreting “valid” in a consistent fashion. What to look for statements were more general in Section A because of the linkage to the practical task. In the other two sections - B and C, “what to look for" criteria were more specific, but as none of the questions had a straightforward, unambiguous, right or wrong answer assessment was always more than just a matching process. This lack of precision is a consequence of the style of statements of attainment being assessed and is directly related to their process characteristics.
A key issue in this trial was the application of the assessment criteria. The question posed in assessing a single statement at a sufficient level of both depth and coverage could be quite complex. It might consist of several parts and consequently, several different assessment decisions had to made by the
assessor. How many of these needed to be answered satisfactorily to accept that the pupil had successfully satisfied that statement? As there was a requirement to operate the principle of criterion referencing it was decided that a pupil must satisfy all aspects of the test in relation to a statement, to be assessed as having satisfied the statement. Asking pupils to get every aspect of a question correct was
undoubtedly a severe rule. The principal outcome from this pilot was a conclusion that:
mark schemes should be constructed to allow for a margin of failure within a question or level - compensation should be available when the majority of an answer is correct.
Another aspect of concern was the way in which the test had assessed Te 4 - Evaluation. As detailed in chapter 3 a political decision had created the split between the practical task and its evaluation. The lack of focus of the practical task in this pilot meant that the question in the test had to be of a general nature, to cover all possible activities and outcomes. If the test had to be more specific so that the marking criteria could be more focused then the practical task would have to be prescribed. In a statutory context there were sound arguments for this decision being taken.
The 1992 Pre-Statutory assessment trial - test
The two principal findings from the 1992 pilot required that a different structure be established for the written test. Te 4 has two generic strands - evaluating one’s own work and evaluating the work of others. The test of Te 4 was now linked exclusively to a focused task undertaken in a prescribed material context. To assess this aspect securely the decision had to be taken not to assess the other strand. The test therefore had two sections.
section pupil context assessment
Section A evaluate the practical task Te 4 Section B identifying needs and Te 1
opportunities in a new context
Section A would need to be developed in three different contexts so that pupils could sit a test which related to the practical task which they all took. Section B would be common to all.
The focus of assessment
As with the practical task there were persuasive reasons why the assessment should focus on the level at which a pupil was operating rather than a mastery of specific statements. Teachers were undoubtedly more comfortable with a process which involved: reading an answer; deciding if it met the assessment criteria; awarding a mark if yes, not awarding a mark if no. Research focused on designing a test which accommodated this principle within a criterion referenced framework. Based on pre-test trials undertaken the following model was developed.
A decision is taken about the number of mark points available at each level. At each level - the statements to be assessed are identified;
questions are devised which probe these statements; on the basis of trialling a mastery level is set;
questions are modified to ensure progression and hierarchy between levels;
trialling confirms reliability of the model.
As manageability was perceived to be a major factor two other strategic decisions were taken.
1. The questions would be designed to create the same number of mark points at each level.
2. The questions would be calibrated to achieve the same mastery requirement at each level.
In addition the statement being assessed by each question was identified so that it would also be possible to identify, for formative purposes in relation to teacher assessment and reporting to parents, the statements which a pupil had satisfied. An additional construct required that at levels where more than one statement was being assessed (this was at all levels except where there was only one statement in the Order) the mastery level could not be achieved on assessment evidence relating to only one of the statements.
The Assessment Instrument * 1
Following extensive trials and modifications an assessment instrument or mark scheme was developed for the first full statutory assessment. The previous research outcomes resulted in an assessment instrument which was a
compromise between a true criterion referenced approach and a traditional norm- referencing mark scheme. This was necessary for three reasons.
1. To promote higher levels of reliability by providing teachers with a procedure with which they would feel confident and familiar.
2. To accommodate political dictate in relation to the method of assessment. 3. To devise a system which would be within the legal requirements of coverage and would enable both summative and formative information, at a variety of levels, to be readily gathered.
The mark scheme provided assessment criteria at three levels of specificity: 1. legal - the statement being assessed;
2. professional - general characteristics of a satisfactory answer
3. examples - a range of pupil responses which had been categorised into acceptable and unacceptable; these ranged from answers on the cusp to those which were clearly in one category or the other.
Selection of statements of attainment
The test was required to assess half or more of the SoAs and consequently the following pattern emerged.
Level Te 1 Te 4
1 1 1
2 2 1
3 1 1 Statements per Test
4 3 2 5 1 2 Test 1 - 12 6 2 3 Test 2 - 15 7 2 1 Test 3 - 14 8 2 1 Test 4 - 11 9 1 1 10 2 1
The following statements were selected to be assessed via the test:
Te 1 1a & 1b; 2b & 2c; 3a & 3b; 4d & 4e; 5a & 5b; 6a & 6c; 7a & 7c; 8a & 8b; 9a & 9b; 10b & 10c
Te4 1a & 1b; 2a & 2b; 3a & 3b; 4a & 4b; 5a & 5b; 6a & 6b; 7a; 8a; 9a; 10a The questions addressed either a whole SoA or part of an SoA . This approach was essential because the design and technology statements have no equality in terms of demand. Some are relatively straightforward, whilst others are extremely complex. At each level it was decided that eight marks should be available. In some instances all eight marks were targeted on a single statement (Te4 level 7) but in the majority of cases they were distributed between two statements (Te1 all levels). The main purpose of the trial was to establish an appropriate level of mastery. The questions could be fine tuned following the trial to ensure satisfactory rates of achievement but there was no established definition of what was
satisfactory, i.e. what percentage of pupils should achieve certain levels. The mastery level did, however, have to be decided prior to the first statutory assessment on the basis of the trial.
The development of the pupil material - Practical tasks
1. 1990-91
As discussed at the beginning of this chapter, before assessment could take place pupils had to be provided with a realistic task which would generate appropriate assessment evidence. Initially these tasks asked teachers to operate with pupils in a style and fashion with which they were unfamiliar. To ensure that the tasks which were developed promoted the philosophy of the Order the following principles were established; each activity should:
- offer pupils the opportunity to undertake a task encompassing the four attainment targets, which would enable them to demonstrate their capability in an holistic fashion;
provide a structure in which pupils could identify needs and opportunities which would form the focus of their task;
- allow pupils to undertake a task in any of the disciplines which fall within the design and technology federation;
- provide pupils with a valid learning experience which drew on the key stage 3 programme of study;
- provide pupils with a satisfying and successful experience which promoted progress and achievement.
The tasks also needed to gain the support of teachers, so in addition each activity should:
- offer teachers the opportunity to take ownership of the activity and adapt its delivery to suit their own circumstances and teaching styles without compromising the standard nature of the activity.
- be sufficiently robust to withstand the variety of teaching styles, curricula organisation and staff collaboration evident in schools.
Research Into structured activities
Three models/structures of delivery were devised which offered sound approaches