In many of the network models that we shall discuss, it is useful to describe certain quantities in terms of vectors. Think of a neural network composed of several layers of identical processing elements. If a particular layer contains n
.2 From Neurons to ANS 21
units, the outputs of that layer can be thought of as an n-dimensional vector, X X2, • • where the t superscript means transpose. In our notation, vectors written in boldface type, such as x, will be assumed to be column vectors. When they are written row form, the transpose symbol will be added to indicate that the vector is actually to be thought of as a column vector. the notation indicates a row vector.
Suppose the n-dimensional output vector of the previous paragraph provides the input values to each unit in an m-dimensional layer (a layer with m units). Each unit on the m-dimensional layer will have n weights associated with the connections from the previous layer. Thus, there are m n-dimensional weight vectors associated with this layer; there is one n-dimensional weight vector for each of the m units. The weight vector of the unit can be written as • • A superscript can be added to the weight notation to distinguish between weights on different layers.
The net input to the unit can be written in terms of the inner product, or dot product, of the input vector and the weight vector. For vectors of equal dimensions, the inner product is as the sum of the products of the corresponding components of the two vectors. In the notation of the previous section,
where n is the number of connections to the unit. This equation can be written succinctly in vector notation as
= x •
or
=
Also note that, because of the rules of multiplication of vectors,
We shall often speak of input vectors and output vectors and weight vectors, but we tend to reserve the vector notation for cases where it is particularly appropriate. Additional vector concepts will be introduced later as needed. In the next section, we shall use the notation presented here to describe a neural- network model that has an important place in history: the perceptron.
The Perceptron: Part 1
The device known as the perceptron was invented by psychologist Frank Rosen- m the late 1950s. It represented his attempt to "illustrate some of the properties of intelligent systems in general, without becoming too
Sensory (S) area
Figure 1.12
Association (A) area Response (R) area Inhibitory connection Excitatory connection
————— Either inhibitory or excitatory
A simple photoperceptron has a sensory area, an association area, and a response area. The connections shown between units in the various areas are illustrative, and are not meant to be an exhaustive representation.
deeply enmeshed in the special, and frequently unknown, conditions which hold for particular biological organisms" [29, p. Rosenblatt believed that the connectivity that develops in biological networks contains a large random ele- ment. Thus, he took exception to previous analyses, such as the
model, where symbolic logic was employed to analyze rather idealized struc- tures. Rather, Rosenblatt believed that the most appropriate analysis tool was probability theory. He developed a theory of statistical separability that he used to characterize the gross properties of these somewhat randomly interconnected networks.
The photoperceptron is a device that responds to optical patterns. We show an example in Figure In this device, light impinges on the sensory (S) points of the retina structure. Each S point responds in an all-or-nothing manner to the incoming light. Impulses generated by the S points are transmitted to the associator (A) units in the association layer. Each A unit is connected to a random set of S points, called the A unit's source set, and the connections may be either excitatory or inhibitory. The connections have the possible values,
— and 0. When a stimulus pattern appears on the retina, an A unit becomes active if the sum of its inputs exceeds some threshold value. If active, the A unit produces an output, which is sent to the next layer of units.
In a similar manner, A units are connected to response (R) units in the response layer. The pattern of connectivity is again random between the layers, but there is the addition of inhibitory feedback connections from the response
From Neurons to ANS 23
Sensory (S) area Association (A) area Response (R) area Inhibitory connection Excitatory connection — Either inhibitory or excitatory
1.13 This Venn diagram shows the connectivity scheme for
a simple perceptron. Each R unit receives excitatory connections from a group of units in the association area that is called the source set of the R unit. Notice that some A units are in the source set for both R units.
layer to the association layer, and of inhibitory connections between R units. The entire connectivity scheme is depicted in the form of a Venn diagram in Figure for a simple perceptron with two R units.
This drawing shows that each R unit inhibits the A units in the complement to its own source set. Furthermore, each R unit inhibits the other. These factors aid in the establishment of a single, winning R unit for each stimulus pattern appearing on the retina. The R units respond in much the same way as do the A units. If the sum of their inputs exceeds a threshold, they give an output value of +1; otherwise, the output is — 1 . An alternative feedback mechanism would connect excitatory feedback connections from each R unit to that R unit's respective source set in the association layer.
A system such as the one just described can be used to classify patterns appearing on the retina into categories, according to the number of response units in the system. Patterns that are sufficiently similar should excite the same R unit. Thus, the problem is one of separability: Is it possible to construct a perceptron such that it can successfully distinguish between different pattern classes? The answer is "yes," but with certain conditions that we shall explore later.
The perceptron was a learning device. In its initial configuration, the percep- tron was incapable of distinguishing the patterns of interest; through a
a reinforcement process whereby the output of A units was either increased or decreased depending on whether or not the A units contributed to the correct response of the perceptron for a given pattern. A pattern was applied to the retina, and the stimulus was propagated through the layers until a response unit was activated. If the correct response unit was active, the output of the con- tributing A units was increased. If the incorrect R unit was active, the output of the contributing A units was decreased.
Using such a scheme, Rosenblatt was able to show that the perceptron could classify patterns successfully in what he termed a differentiated environ- ment, where each class consisted of patterns that were in some sense similar to one another. The perceptron was also able to respond consistently to random patterns, but its accuracy diminished as the number of patterns that it attempted to learn increased.
work resulted in the proof of an important result known as the perceptron convergence theorem. The theorem is proved for a perceptron with one R unit that is learning to differentiate patterns of two distinct classes. It states, in essence, that, if the classification can be learned by the perceptron, then the procedure we have described guarantees that it will be learned in a finite number of training cycles.
Unfortunately, perceptrons caused a fair amount of controversy at the time they were described. Unrealistic expectations and exaggerated claims no doubt played a part in this controversy. The end result was that the field of artificial neural networks was almost entirely abandoned, except by a few die-hard re- searchers. We hinted at one of the major problems with perceptrons when we suggested that there were conditions attached to the successful operation of the perceptron. In the next section, we explore and evaluate these considerations.
Exercise 1.4: Consider a perceptron with one R unit and association units,