• No se han encontrado resultados

CAPITULO II: MARCO TEÓRICO – CONCEPTUAL

2.9 Legislación Ambiental

We concentrate on algorithms that update the augmented weight vectoratby adding a

suitable positive amount in the direction of the misclassified (according to an appropriate condition) training patternyk. The general form of such an update rule is

at+1 =

at+ηtftyk Nt+1

, (5.6)

where ηt is the learning rate which could depend (usually explicitly) on the number t of updates that have taken place so far and ft an implicit function of the current

step (update) t, possibly involving the current weight vector at and/or the current

misclassified patternyk, which we require to be positive and bounded, i.e.

0< fmin≤ft≤fmax . (5.7)

We also allow for the possibility of normalising the newly produced weight vector at+1

to a desirable length through a factorNt+1. For the Perceptron algorithmηtis constant, ft = 1 and Nt+1 = 1. Each time the predefined misclassification condition is satisfied

by a training pattern the algorithm proceeds to the update of the weight vector. Thus,

t (also called “time”) keeps track of the number of updates which coincides with the number of mistakes (satisfactions of the misclassification condition). In the present section we adopt the convention of initialising tfrom 1.

A sufficiently general form of the misclassification condition is

ut·yk ≤C(t) , (5.8)

where ut is the weight vector at normalised to unit length and C(t) >0 if we require

that the algorithm achieves a positive margin. If a1 = 0 we treat the first pattern in

bounded from above by a strictly decreasing function oftwhich tends to zero or remains bounded from above and below by positive constants.

In the first case the minimum directional margin required by such a condition becomes lower than any fixed value providedtis large enough. Algorithms with such a condition have the advantage of achieving some fraction of the unknown existing margin provided they converge. Examples of such algorithms are the well-known standard Perceptron algorithm with margin [15, 35,39,56], ALMA2 [21], CRAMMA [57] and MICRA [58].

In the standard Perceptron algorithm with margin the misclassification condition takes the form

ut·yk ≤ b

katk

, (5.9)

where c1(t−1)≤ katk ≤c2√t−1 with b, c1, c2 positive constants (see Section5.5). In

the ALMA2 algorithm the misclassification condition is

ut·yk≤ b

katk√t

, (5.10)

in whichc3√t−1≤ katk ≤R withb, c3 positive constants (see Section5.6). Notice the

striking similarity characterising the behaviour of C(t) in the Perceptron and ALMA2

algorithms.

In the second case the condition amounts to requiring a directional margin, assumed to exist, which is not lowered arbitrarily witht. In particular, ifC(t) is equal to a constant

β [56] (5.8) becomes

ut·yk ≤β (5.11)

and successful termination of the algorithm leads to a solution with margin larger than

β. Obviously, convergence is not possible unlessβ < γd. In this case an organised search

through the range of possibleβ values is necessary.

An alternative classification of the algorithms with the perceptron-like update rule (5.6) is according to the dependence on tof the “effective” learning rate [57]

ηefft≡ ηtR

katk

(5.12)

which controls the impact that an update has on the current weight vector. More specifically, ηefft determines the update of the direction ut

ut+1=

ut+ηefftftyk/R

kut+ηefftftyk/Rk

. (5.13)

Again we distinguish two categories depending on whether ηefftis bounded from above

by a strictly decreasing function of t which tends to zero or remains bounded from above and below by positive constants. We do not consider the case thatηefft increases

indefinitely with t since, as we will argue in the next section, we do not expect such algorithms to converge always in a finite number of steps.

In the first category belong the Perceptron algorithm with both the standard misclas- sification condition (5.9) and the fixed directional margin one of (5.11) [56] in which ηt

remains constant andkatkis bounded from below by a positive linear function oft. Also

to the same category belongs the ALMA2 algorithm in which ηt decreases as 1/√tand

MICRA. The similarity of the standard Perceptron with margin and ALMA2algorithms

with respect to the behaviour of ηefft is apparent if we consider the bounds obeyed by

katkin these two cases. Moreover, in both algorithms ηefft is proportional toC(t).

As an example of algorithms belonging to the second category we mention algorithms with the fixed directional margin condition of (5.11), katk normalised to the target

margin value β and fixed learning rate [56]. To the same category also belongs the CRAMMA algorithm.

In summary, the misclassification condition of a perceptron-like algorithm could, roughly speaking, either be “relaxed” with the number of updates (i.e. witht) or remain prac- tically constant. Similarly, the effective learning rate could either be reduced with tor remain practically constant. Thus, we are led to four broad categories of algorithms. In subsequent sections we shall present an analysis of the algorithms mentioned above which are representative but sufficiently general cases belonging to all of these categories.