• No se han encontrado resultados

Capítulo 2. ANTECEDENTES HISTÓRICOS DEL USO DE ESTRATEGIAS

2.2 La metáfora de la ciudad industrial: Medellín principal centro industrial de

2.2.1 La ciudad informal: imagen de la ciudad sin forma

P(L(St)), the prior probability of a frame belonging to the class L(St). This

assumption is clearly incorrect, but is found to work in practice. This prob- ability can be estimated by counting the number of frames in each class ac- cording to the labels of the training set.

Thus there are two expressions for the likelihoodL(Wjx0)of a word, which can be normalized to give word probabilities:

P(Wjx  0) = L(Wjx0) P WL(Wjx0) (8.13) L(Wjx  0)  P(W) X S2S(W)  Y t=0P( xtjL(St)) ! S0 Y1 t=0aSt;St+1 ! (8.14) L(Wjx  0)  P(W) X S2S(W)  Y t=0 P(L(St)jxt0) P(L(St)) ! S0 Y1 t=0aSt;St+1 ! : (8.15) Equation 8.14 is used for the table look-up system and equation 8.15 is used for the recurrent network. For simplicity, the likelihoodsP(xtjL(St))are used henceforth, but the scaled likelihoods P(L(S

t) jx

t

0)

P(L(S

t)) are to be understood when the equations are applied to the recurrent network system.

These expressions can be calculated eciently using the principle of Dy- namic programming, in an array structure representingthe states of the Mar- kov model. In this model, each state is accorded a probability r(t), which

is the probability of being in state r aftertframes have been observed.Thus

r(0) =r the initial distribution.

As successive frames of data are fed into the recognizer, and character probabilities are generated, the Markov model forward probabilities are cal- culated recursively by the formula:

r(t + 1) = X

p p(t)P(xtjL(qp))ap;r (8.16) until all have been processed. At this point the nal state (dashed in g- ure 8.1) contains P(x0jW) = n( + 1), the likelihood that the data x0 repre- sented the word of this model. By choosing the maximum of the likelihoods,

argmaxWL(Wjx0), if the models are good, a good estimate of the identity of the original word is obtained.

All of these probabilities are stored and multiplied in the log domain for speed and numerical accuracy. Multiplications become additions in the log domain. Probability additions can be calculated by using the identity

log(a + b)  loga + log(1 + exp(log b loga)); (8.17) and deriving the second term from a look-up table, as described by Brown (1987).

CHAPTER 8. HIDDEN MARKOV MODELLING

8.1.1 Labelling

It will be recalled from chapter 4 that the database consists of both upper and lower case letters as well as punctuation. In fact the punctuation is excluded in the segmentation process, so only word images are passed to the prepro- cessing system, and no recognition of punctuation is carried out. If this were desired, a separate system for recognizing punctuation marks would be nec- essary. As punctuation marks appear in isolation and are largely de ned by location, the recurrent network apparatus would be inappropriate. A much simpler system could be used, perhaps based on rules for the location and contour shape of punctuation marks.

The system described here gives a distribution across the 26 letter cate- gories, and makes no distinction between upper and lower case letters. An `a' and an `A' are both labelled the same, and the network is trained to give

the same output for either. There are not enough examples of capital letters in the database to train a network with separate output classes for both up- per and lower case letters, since capitals only occur at the beginning of a few words and in a few acronyms. Indeed, the current system recognizes capital letters poorly, but since they are generally only initial letters, recognition is still possible based on the remaining letters and the constraints of the lim- ited vocabulary. Testing a 160-unit network with a grammar gave an 8.8% error rate, but among words with capitals the error rate was 35%. The aver- age rank in the lexicon of incorrectly recognized words with capitals was 96, compared to 15 for incorrect words without capitals. More data with capital letters would improve the recognition rate on capital letters, bringing down the overall error rate.

If more data were available, and distinction between upper and lower case were required, the network could be given 52 outputs to represent the upper and lower case letters. However, it might be better (because the net- work size would be kept down) to keep just 26 output categories, and have a separate unit indicating the case of the letter. Such a unit would give an independent probability, with a sigmoid output (equivalent to the two-class softmax). When using such a system, the hidden Markov models would need to be adapted to account for the separate classes and, according to the task, models with initial capitals, full capitals or even mixed case words could be permitted.

Some systems (Schenkel et al. 1994) have a `noise' output class to allow the network to indicate that the inputs do not correspond to any of the letter classes. Such a class could be used in this system to represent poor writing or the ligatures between letters, but the implementation would be dicult since there is no noise or ligature class in the labelling of the training data. Since the system accepts cursive and discrete writing, the data would need to be hand-labelled to indicate the presence of ligatures. If such hand-labelling were done, then an optional ligature model could be inserted between the letter models of each word. A noise model could be placed in parallel with

CHAPTER 8. HIDDEN MARKOV MODELLING