Begin by assigning arbitrary values to the weights. From that point on the weight surface, determine the direction of the steepest slope in the downward direction. Change the weights slightly so that the new weight vector lies farther down the surface. Repeat the process until the minimum has been reached. This procedure is illustrated in Figure Implicit in this method is the assumption that we know what the weight surface looks like in advance. We do not know, but we will see shortly how to get around this problem.
Typically, the weight vector does not initially move directly toward the minimum point. The cross-section of the paraboloidal weight surface is usually elliptical, so the negative gradient may not point directly at the minimum point, at least initially. The situation is illustrated more clearly in the contour plot of the weight surface in Figure
Figure 2.10 We can use this diagram to visualize the steepest-descent
method. An initial selection for the weight vector results
in an error, The steepest-descent method consists of
sliding this point down the surface toward the bottom, always
2. 3.
Figure In the contour plot of the weight surface of Figure the direction of steepest descent is perpendicular to the contour lines at each point, and this direction does not always point to the minimum point.
Because the weight vector is variable in this procedure, we write it as an explicit function of the timestep, t. The initial weight vector is denoted and the weight vector at timestep t is At each step, the next weight vector is calculated according to
1) = + (2.10)
where is the change in at the timestep.
We are looking for the direction of the steepest descent at each point on the surface, so we need to calculate the gradient of the surface (which gives the direction of the steepest slope). The negative of the gradient is in the direction of steepest descent. To get the magnitude of the change, multiply the gradient by a suitable constant, The appropriate value for will be discussed later. This procedure results in the following expression:
2.2 Adaline and the Adaptive Linear Combiner
All that is necessary to complete the discussion is to determine the value of at each successive iteration step.
The value of was determined analytically in the previous section. Equation (2.6) or Eq. (2.9) could be used here to determine but we would have the same problem that we had with the analytical determination of We would need to know both R and p in advance. This knowledge is equivalent to knowing what the weight surface looks like in advance. To circumvent this difficulty, we use an approximation for the gradient that can be determined from information that is known explicitly at each iteration.
For each step in the iteration process, we perform the following: 1. Apply an input vector, to the Adaline inputs.
2. Determine the value of the error squared, using the current value of the weight vector
(2.12) 3. Calculate an approximation to by using as an approximation
for
(2.13) = (2.14)
where we have used Eq. to calculate the gradient explicitly. 4. Update the weight vector according to Eq. using Eq. as the
approximation for the gradient:
+ 1) = + (2.15)
5. Repeat steps 1 through 4 with the next input vector, until the error has been reduced to an acceptable value.
Equation (2.15) is an expression of the LMS algorithm. The parameter determines the stability and speed of convergence of the weight vector toward the minimum-error value.
Because an approximation of the gradient has been used in Eq. the path that the weight vector takes as it moves down the weight surface toward the minimum will not be as smooth as that indicated in Figure Figure shows an example of how a search path might look with the LMS algorithm of Eq. (2.15). Changes in the weight vector must be kept relatively small on each iteration. If changes are too large, the weight vector could wander about the surface, never finding the minimum, or finding it only by accident rather than as a result of a steady convergence toward it. The function of the parameter is to prevent this aimless searching. In the next section, we shall discuss the parameter, and other practical considerations.
Figure The hypothetical path taken by a weight vector as it searches for the minimum error using the algorithm is not a smooth curve because the gradient is being approximated at each point. Note also that step sizes get smaller as the minimum-error solution is approached.
2.2.2 Practical Considerations
There are several questions to consider when we are attempting to use the ALC to solve a particular problem:
• How many training vectors are required to solve a particular problem? • How is the expected output generated for each training vector? • What is the appropriate dimension of the weight vector? • What should be the initial values for the weights? • Is a bias weight required?
2.2 and the Adaptive Linear Combiner 65