CAPÍTULO III. RESULTADOS ANÁLISIS Y DISCUSIÓN
3.2. Desempeño profesional en el ámbito de la planificación institucional y
3.2.3. Nivel de conocimiento que tiene el docente en el ámbito de la planificación para
3.3.2.5. Nivel de importancia del ámbito de la gestión del aprendizaje desde la
Pattern recognition methods can be broadly classified into two distinct groups, statistical and syntactic. This thesis is primarily concerned with statistical pattern recognition
66
1\
rather than syntactic recognition. There are many methods which have been developed in the statistical pattern recognition field "and an introduction to many of them may be found in [26] these include kernel methods, Nearest neighbour methods, Fisher discriminants and so forth. The Bayesian basis of pattern recognition and Fisher's approach are briefly described here.
5.3.1 Bayesian Classification
U.sing the notation of the introduction begin by assuming the probability that an object comes from class Wj is a known P(Wj). As this is an overall probability known before an observation vector x has been observed it is a Prior probability. Once an observation is made and an observation vector x is known we can compare the probabilities of belonging to each class for an observation x and classify according to whichever is larger.
for all j
i=
k, where Ok is the set of objects in the k th class.This rule is known as Bayes' minimum error rule. The P(wjlx) are known as the Pos-teriori probabilities. Sadly these are not normally known and must be estimated. This estimation can be done by making use of samples of known classification as is pursued later in this thesis. In many cases however, Bayes' theorem is applied to give:
P( "I ) = P(xlwdP(wd
w, x P(x) Yielding:
For all j
i=
k. As before however P(x 1 Wj) are not often known.It is often the case however that an incorrect classification may be of varying importance as a function of class. For example, in medical diagnosis it is often a minor problem in classifying a healthy patient as unwell but it is very dangerous to classify an unwell patient as well.
To build in this factor a cost function may be defined
Cj
which gives the cost of misclas-sifying an object from class Dj as from class Dj. If x E Dj the expected cost is:N
Tj =
LC
jj1 .
P(xI
wddxj=l OJ
The overall expected cost or risk is thus:
This is minimised by defining Dk such that x E Dk whenever:
For all j =1= k.
This is the Bayes minimum risk decision rule.
5.3.2 Classical Fisher Statistical Pattern Recognition
Fisher in 1936 founded the classical approach to discriminant analysis with Fisher's crite-rion. The problem is one of finding that direction in the discriminant space along which the two groups to be classified are maximally separated. Fisher defined the separation between the two groups in a particular direction as the distance between the means of the two groups standardised for the within group variance in the specified direction. The importance of this standardisation may be appreciated by consideration of the following example.
Prior to standardisation the separation in the Xl direction appears greater than that in the X2 direction (Fig.5:2).
After standardisation it is however clear that the separation in the X2 direction is greater.
In general, standardisations may be performed in any general direction, and the problem is to find the direction v such that (vtxl - vtx2) is maximised relative to the standard deviation (VtSV)1/2 in that direction, where the Xi is the sample mean for the design set for class Wj (i=1,2), and S is the assumed common sample variance-covariance matrix.
I
I
I I I I I I I I I I
\
\
clossl
c=>
c=>
closs2
Figure 5.2: Prior to standardisation
06
closs2 1Figure 5.3: After standardisation
I'
For each class if we have a set of observed sample vectors 0'0 . . . 0'0 then:
and:
. 1 n
xi = - LO'P n p=l
1 n
S = - - 1
L
(O'p - xd(O'p - Xi)t n - p=lTo find v maximise with respect to v:
vtXI - vtX2 (vtS)1/2
Differentiating this with respect to v and equating it to zero gives:
vt(XI - X2)SVt Xl - X2 = (v t Sv)I/2 As only the direction is required:
5.3.3 Limitations of the Classical Approaches
It is difficult to generalise over so many different approaches to the pattern recognition problem. However, in general the computational load required to perform multiclass mul-tivariate recognition is prohibitive with many of the classical methods. The computational form is also often not suitable for parallel implementation, and it is often very difficult to create an adaptive method which will optimise itself easily as more data arrives, without a complete recalculation. Most of the most popular classical techniques are the parame-terised distribution techniques which assume that the problem closely resembles a priori probability distributions (such as Gaussian) and although some methods allow the dis -tribution parameters to vary as a function of class the dis-tribution form is usually fixed.
This assumption is often invalid as the probability distributions inherent in the problem are often highly anisotropic, class and observation dependent. For example, the variation found in handwritten characters arises not from purely random processes but the different observed vector directions , such as displacements or speeds, may correlate in the way they vary. The non-parametric methods such as nearest neighbour normally become com-putationally very heavy as they may require multidimensional search operations. These
70
'\
II
II
I I I I
I I
II
II I
problems have led to the search for computationally lighter, adaptive, parallel methods which require fewer a priori assumptions. Connectionist models avoid these limitations as they are non-parametric and make weaker assumptions about the shapes of the underlying distributions than traditional statistical pattern recognisers.