• No se han encontrado resultados

CAPÍTULO II: Marco teórico

2.3. Paradigma Sociocognitivo – humanista

Almost all events or features in the world are based on probabilities rather than certainties. The movement of billiard balls on a table becomes rapidly unpredictable the more the cue ball strikes the other balls; uncertainties govern the movement and position of fundamental particles; if someone has long hair then that person is probably a woman; most (but not all) fruits are sweet, and so on. Fortunately, some events or co- variations of features are more likely than others, otherwise the world would be even more unpredictable than it already is. Our beliefs reflect the fact that some things are more likely to occur than others. Aeschylus probably did not weigh up the benefits of going out in a boat against the potential costs of being killed by a falling tortoise.

Bayes’Theorem allows us to combine our prior beliefs, or the prior likelihood of something being the case, with changes in the environment. When new evidence comes to light, or if a particular behaviour proves to be useful in achieving our goals, then our beliefs (or our behaviour) can be updated based on this new evidence. The theorem can be expressed as:

Odds (A given B)=LR×Odds(A)

where A refers to one’s beliefs, a theory, a genetic mutation being useful, or whatever; B refers to the observed evidence or some other type of event. Odds (A given B) could reflect the odds of an illness (A) given a symptom (B), for example, in the equation, Odds (A) refers to what is known as the “prior odds”—a measure of the plausibility of a belief or the likelihood of an event. For example, a suspicion that someone is suffering from malaria might be quite high if that person has just returned from Africa. The LR is the Likelihood Ratio and is given by the formula:

where P is the probability, B is an event (or evidence), and A is the aforementioned belief, theory, or other event. The Likelihood Ratio therefore takes into account both the probability of an event (B) happening in conjunction with another event (A) (a symptom accompanying an illness) and in the absence of A (the probability of a symptom being displayed for any other reason).

Anderson has used Bayesian probabilities in his “rational analysis” of behaviour and cognition. For example, in the domain of memory, the probability that an item will be retrieved depends on the how recently the item was used, the number of times it has been used in the past, and the likelihood it will be retrieved in a given context. In problem solving there is an increased likelihood that the inductive inferences produced by using an example problem with a similar goal will be relevant in subsequent similar situations.

or, as a result of using a word processor, one might learn the concept of a “string” (Payne et al., 1990).

Declarative memory in ACT-R

The “chunk” is the basic unit of knowledge in declarative memory. A chunk is a data structure also known as Working Memory Elements (or “wimees”). Only a limited number of elements can be combined into a chunk—generally three. Examples would be USA, BSE, AIDS. Chunks have configural properties such that the different component elements have different roles. “1066” is a chunk because it is a meaningful unit of knowledge (assuming you know about the Norman Conquest), but the elements rearranged to form 6106 would no longer constitute a chunk. Chunks can be hierarchically organised. Broadbent (1975) found that 54% of people split items entering working memory into pairs, 29% split them into three items, 9% into four items, and 9% longer. French phone numbers, being divided into pairs of numbers, are therefore ideal. Generally speaking two or three items make a reasonably good chunk size.

Chunks can also be used to represent schemas in declarative memory. Table 8.1 represents a schema for an addition problem. In the first part the problem is given

TABLE 8.1

Schema representation of the problem

a name (“problem 1”). This is followed by a series of “slots”, the first one (the “isa” slot) being the problem type. The next row has a slot for columns which are listed in brackets. This part also shows a simple example of a hierarchical arrangement, as the columns themselves are represented as schematic chunks. If we take “column 1” as an example we can see that this represents configural information. The chunk type (represented by the “isa” slot) determines to an extent the types of associated slots that follow. The chunk size is also determined by the number of associated slots.

Working memory in ACT-R

Working memory is a convenient fiction. It is simply that part of declarative memory that is currently active and available. Information can enter working memory from the environment, through spreading activation in declarative memory (associative priming—see Chapter 3), or as a result of the firing of a production in production memory. For example, if your goal is to do three-column addition and you haven’t yet added the rightmost column then add the rightmost column. If your goal is to do three-column addition and you have added the rightmost column (now in working memory along with any carry) then add the middle column. If your goal is to do three-column addition and you have added the rightmost column and the middle column (both in working memory) then add the leftmost column, and so on.

Production memory in ACT-R

The production is the basic unit of knowledge in procedural memory. Examples of productions have already been encountered in Chapter 4 along with aspects of their role in the transfer of procedural skill. Declarative knowledge is the basis for procedural knowledge. ACT-R requires declarative structures to be active to support procedural learning in the early stages of learning a new skill. Productions in ACT-R are modular. That is, deleting a production rule will not cause the system to crash. There will, however, be an effect on the behaviour of the system. If you had a complete model of two-column subtraction and deleted one of the production rules that represented the subtraction problem, then you would generate an error in the

subtraction. This is one way you can model the kinds of errors that children make when they learn subtraction for the first time (Young & O’Shea, 1981).

Procedural memory in ACT-R can respond to identical stimulus conditions in entirely different ways depending on the goals of the system.

Learning in ACT-R

The development of cognitive skill involves somehow transforming declarative knowledge (such as the instructions for driving a car) into procedural knowledge (skilled driving). As a result of this process, knowledge is compiled. This is an analogy with computers where “high-level” languages have to be interpreted by the computer’s built-in machine code before the program can run. That is, the program has to be compiled, and the result of converting it to machine code means it is no longer directly accessible in its original form. Anderson’s views on how this comes about have evolved over the years.

When we encounter a novel problem for the first time we hit an impasse—we don’t immediately know what to do. We might therefore attempt to recall a similar problem we have encountered in the past and try to use that to solve the current one. According to Anderson, this process involves interpretive problem solving. This means that our problem solving is based on a declarative account of a problem-solving episode. This would include, for example, using textbook examples or information a teacher might write on a board. Anderson argues that even if we have only instructions rather than a specific example to hand, then we interpret those instructions by means of an imagined example and attempt to solve the current problem by interpreting this example. As a result the information is compiled. “The only way a new production can be created in ACT-R is by compiling the analogy process. Following instructions creates an example from which a procedure can be learned by later analogy” (Anderson, 1993, p. 89).

Figure 8.5. The architecture of ACT-R (Anderson, 1993).

In short Anderson is arguing that learning a new production—and, by extension, all skill learning— occurs through analogical problem solving. In this sense the model is similar to Holland and co-workers’ (1986) PI model.

Figure 8.6 shows the analogy mechanism in ACT-R in terms of the problem representation used throughout this book. In this model, the A and C terms represent goals (the problem statement including the problem’s goal). The B term is the goal state, and the line linking the A and B terms is the “response”: the procedure to be followed to achieve the goal state. As in other models of analogy, mapping (2 in Figure 8.6) involves finding correspondences between the example and the current problem.

Before looking in a little more detail at ACT-R’s analogy mechanism try Activity 8.1. The mapping of the elements in the example onto elements in the current problem is shown in more detail in Figure 8.7. The structure in the example shows you that the problem involves a Lisp operation, that the operation involves multiplication, that there are two “arguments”: argl and arg2, and that there is a method for doing it shown by response 1. These elements can be mapped onto the elements of the current problem. The problem type is different, nevertheless both are arithmetic operations. They are similar because

ACTIVITY 8.1

Imagine you are given the problem of writing a Lisp function that will add 712 and 91. You have been given an example that shows you how to write a Lisp function that will multiply two numbers together:

Defun multiply-them (2 3)

*2 3

You know that “defun” defines a function and that “multiply-them” is a name invented for that function and that * is the multiplication symbol.

How would you write a function in Lisp that will add 712 and 91? (See p. 237 for answer.)

they are semantically related and because they belong to the same superordinate category type.

Figure 8.6. The analogy mechanism in ACT-R.

THE POWER LAW OF LEARNING

It is a common experience (so common as to be unremarkable) that the more we practice something the easier it gets. Furthermore, you may have noticed that you can get to a point where it seems hard to improve any further. Top athletes, for example, have to put in a huge amount of practice just to stand still (if the metaphor doesn’t sound too bizarre). Indeed, there is a ubiquitous finding in the literature that learning something new produces a curve of a predictable shape. That is, there is a relationship between amount of practice and measures such as reaction times, recognition times, number of items recalled, number of items correct, and so on. Figure 8.8 shows the relationship between practice and measures such as reaction times or error rates.

This relationship is known as the Power Law of Learning (Newell & Rosenbloom, 1981). It is called a Power Law because the equation involves practice raised to a power:

T=cPx

where T is time taken to perform a task, c is a constant, P is the amount of practice, and x represents the slope of the line in Figure 8.8. The curve in the figure shows that the benefits of practice get smaller and smaller, although we still learn something. Examples of the Power Law of Learning have turned up all over the place (Anderson, 1982; Anderson, Boyle, & Yost, 1985; Logan, 1988; Newell & Rosenbloom, 1981; Pirolli & Anderson, 1985).

Anderson provides several examples and several rationales for how the Power Law operates using a Bayesian analysis and how the learning mechanisms of

ACT-R spontaneously generate such learning (and forgetting) curves (see Anderson, 1990, 1993, 2000a, b for further details).

CRITICISMS OF PRODUCTION-SYSTEM MODELS

There have been a number of criticisms of production-system models of learning and expertise. For example, Anderson has argued that there is an asymmetry in production rules such that the conditions of rules will cause the actions to fire but the actions will not cause conditions to fire. The conditions and actions in a condition-action rule cannot swap places. Learning LISP for coding does not generalise to using LISP for code evaluation; for example, one can become very skilled at solving crossword puzzles, but that skill should not in theory translate into making up a crossword for other people to solve. The skills are different. We saw in Chapter 4 that transfer was likely only where productions overlapped between the old

Figure 8.7. Mapping correspondences between the example and the current goal.

and new task (e.g., Singley & Anderson, 1989). Practising a skill creates use-specific or context-specific production rules. McKendree and Anderson (1987) and Anderson and Fincham (1994) have provided experimental evidence for this use-specificity in production rules (see Table 8.2). ACT-R successfully models this transfer or retrieval asymmetry.

In Table 8.2, V1 is a variable that has the value (A B C). CAR is a “function call” in LISP that returns the first element of a list. The list in this case is (A B C) and the first element of that list is “A”. When LISP evaluates (CAR V1) the result is therefore “A”. The middle column in Table 8.2 represents a task where a LISP expression is evaluated. The third column represents a situation where someone has to generate a function call that will produce “A” from the list (A B C). McKendree and Anderson argue that the production rules involved in the evaluation task and in the generation task are different.

TABLE 8.2

Information presented in evaluation and generation tasks in McKendree and Anderson (1987)

Type of information Evaluation Generation

V1 (ABC) (ABC)

Function call (CAR V1) ?

Result ? A

Adapted from Müller (1999).

P1: If the goal is to evaluate (CAR V1) And A is the first element of V1 Then produce A.

P2: If the goal is to generate a function call And ANSWER is the first element of V1 Then produce (CAR V1)

As the condition sides of these two productions do not match, the transfer between them would be limited, bearing in mind that the knowledge they represent is compiled. A similar finding was made by Anderson and Fincham (1994) who got participants to practise transformations of dates for various activities such as: “Hockey was played on Saturday at 3 o’clock. Now it is Monday at 1 o’clock.” Transforming one day and

Figure 8.8. The Power Law of Learning: (a) represents a hypothetical practice curve and (b) is the same curve

represented as a logarithmic scale.

time into another involves adding or subtracting 1 or 2 from Day 1 (e.g., Saturday +2) and Hour 1 (1–2). Practising on one transformation did not transfer to the reverse transformation.

Müller (1999), however, challenged the idea of use-specificity of compiled knowledge. One effect of such use-specificity is that skilled performance should become rather inflexible; yet expertise, if it is of any use, means that knowledge can be used flexibly (this aspect of expertise is discussed in the next chapter). Müller also used LISP concepts such as LIST, INSERT, APPEND, DELETE, MEMBER, and LEFT. He also got his participants to learn either a generation task or an evaluation task using those concepts in both. His study was designed to distinguish between the results that would be obtained if the use of knowledge was context-bound and those that follow on from his own hypothesis of conceptual integration. According to this hypothesis concepts mediate between a problem’s givens and the requested answers. Concepts have a number of features that aggregate to form an integrated conceptual unit. The basic assumptions of the hypothesis (Müller, 1999, p. 194) are:

(a) conceptual knowledge is internally represented by integrative units; (b) access to the internal representation of conceptual knowledge depends on the degree of match between presented information and conceptual features; (c) conceptual units serve to monitor the production of adequate answers to a problem; and (d) the integration of a particular feature into the internal representation depends on its salience during instruction, its relevance during practice, or both.

Whereas ACT-R predicts that transfer between different uses of knowledge would decrease with practice, the hypothesis of conceptual integration predicts that transfer between tasks involving the same concepts would increase with practice. Müller found typical learning curves in his experiments, but did not find that this pre-sumably compiled knowledge was use-specific. There was a relatively high transfer rate between evaluation and generation tasks. Thus the overlap and relevance of conceptual features was important in predicting transfer and allowed flexibility in skilled performance. Going back to our example of the crossword puzzler, it seems reasonable to suppose that a skilled puzzle solver can transfer his or her knowledge of how to solve crossword puzzle clues to the task of generating them. Müller’s hypothesis can account for this type of transfer whereas use-specificity of productions should make such transfer unlikely. Others have criticised production systems as representations of cognitive skill.

As Copeland (1993, p. 101) puts it:

When I turn out an omelette to perfection, is this because my brain has come to acquire an appropriate set of if-then rules, which I am unconsciously following, or is my mastery of whisk and pan grounded in some entirely different sort of mechanism? It is certainly true that my actions can be described by means of if-then sentences: if the mixture sticks then I flick the pan, and so on. But it doesn’t follow from this that my actions are produced by some device in my brain scanning through lists of if-then rules of this sort (whereas that is how some computer expert systems work).

Copeland is arguing here that regarding knowledge as production rules may simply be a useful fiction. In a similar vein, Clancey (1997) argues that it is a mistake to equate knowledge with knowledge representation. The latter is an artefact. Architectures such as ACT-R and SOAR are descriptive and language-based, whereas human knowledge is built from activities: “Ways of interacting and viewing the world, ways of structuring time and space, pre-date human language and descriptive theories” (Clancey, 1997, p. 270). Furthermore he argues that too much is missed out of production-system architectures, such as the cultural and social context in which experts operate.

Johnson-Laird (1988a, b) has suggested that production-system architectures can explain a great deal of the evidence that has accrued about human cognition. However, their very generality makes them more like a programming language than a theory and hence difficult to refute experimentally. He also argues (Johnson-Laird, 1988b) that condition-action rules are bound to content (the “condition” part of the production) and so are poor at explaining human abstract reasoning. Furthermore, he has argued (1988b) that regarding expertise as compiled procedures suggests a rigidity of performance that experts do not exhibit (this aspect is discussed further in the next chapter).

Understanding and conceptual change

Some researchers have argued that production models of learning tend to ignore metacognitive skills and the social contexts of learning (see Glaser & Bassok, 1989, for a comparison of learning theories). Brown and Palincsar (1989), for example, state that students learn specific knowledge-acquisition strategies. Group settings encourage understanding and conceptual change and these are contrasted with “situations that encourage automatization, ritualization, or routinization of a skill, in which speed is emphasized at the expense of thought” (Brown & Palincsar, 1989, p. 395). Metacognitive strategies are also emphasised in the study of the role of self-explanations in learning (Chi et al., 1989, 1994). Micheline Chi and colleagues lay stress on the declarative (conceptual) knowledge that underpins understanding and problem solving. For example, Chi and Bassok (1989) argue that “the critical issues are how and to what extent students understand the declarative principles that govern the derivation of a solution” (p. 254).

Cobb and Bowers (1999) also argue that the cognitive approach to learning adopted by Anderson (among others) may not be helpful in deciding what to do in a classroom whereas the situated learning approach is more useful in that context (see Chapter 4 for more on this debate).

NEUROLOGICAL EVIDENCE FOR THE DEVELOPMENT OF PROCEDURAL KNOWLEDGE

Neurological evidence for a distinction between declarative and procedural knowledge has existed for some time (see e.g., Squire, Knowlton, & Musen, 1993). More recently there has been some more detailed evidence of the mechanisms involved in chunking, the learning of condition-action bonds, and the

Documento similar