• No se han encontrado resultados

The delta rule

N/A
N/A
Protected

Academic year: 2023

Share "The delta rule"

Copied!
19
0
0

Texto completo

(1)

The delta rule

MIT Department of Brain and Cognitive Sciences 9.641J, Spring 2005 - Introduction to Neural Networks Instructor: Professor Sebastian Seung

(2)

Learn from your mistakes

(3)

If it ain’t broke, don’t fix it.

(4)

Outline

• Supervised learning problem

• Delta rule

• Delta rule as gradient descent

• Hebb rule

(5)

Supervised learning

Given examples • Find perceptron such that

0,1

R

N

→ { }

x

1

y

1

y

a

= H w (

T

x

a

)

x

2

y

2

x

3

y

3

M

(6)

Example: handwritten digits

• Find a perceptron that detects “two”s.

Figure by MIT OCW.

Image x

Label y

4 0

2 1

0 0

1 0

3

0

(7)

Delta rule

w = η [ y H w (

T

x ) ] x

• Learning from mistakes.

• “delta”: difference between desired and actual output.

• Also called “perceptron learning rule”

(8)

Two types of mistakes

• False positive

y = 0, H w (

T

x ) = 1

– Make w less like x.

w = − η x

• False negative

y = 1, H w (

T

x ) = 0

– Make w more like x.

w = η x

• The update is always proportional to x.

(9)

Objective function

• Gradient update

e

T

w = − η ∂ e w, ( x, y ) = y H w (

T

x ) w x

w

• Stochastic gradient descent on

e w, x, y )

E ( w ) = (

E=0 means no mistakes.

(10)

Perceptron convergence theorem

• Cycle through a set of examples.

• Suppose a solution with zero error exists.

• The perceptron learning rule finds a solution in finite time.

(11)

If examples are nonseparable

• The delta rule does not converge.

• Objective function is not equal to the number of mistakes.

• No reason to believe that the delta rule minimizes the number of mistakes.

(12)

Memorization & generalization

• Prescription: minimize error on the training set of examples

• What is the error on a test set of examples?

• Vapnik-Chervonenkis theory

– assumption: examples are drawn from a probability distribution

– conditions for generalization

(13)

contrast with Hebb rule

w = η yxw = η ( y y ) x

• Assume that the teacher can drive the perceptron to produce the desired

output.

• What are the objective functions?

(14)

Is the delta rule biological?

• Actual output: anti-Hebbian

w = − η H w (

T

x ) x

• Desired output: Hebbian

w = η yx

• Contrastive

(15)

Objective function

• Hebb rule

– distance from inputs

• Delta rule

– error in reproducing the output

(16)

Supervised vs. unsupervised

• Classification vs. generation

• I shall not today attempt further to define the kinds of material

[pornography] … but I know it when I see it.

– Justice Potter Stewart

(17)

Smooth activation function

• same except for slope of f

f w

T

x ) [ y f w

T

x

w = η ′ ( ( ) ] x

• update is small when the argument of f has large magnitude.

(18)

Objective function

• Gradient update

w = − η ∂ e w, x , y ) =

2 [ y f w x

e ( 1 ( )

T

]

2

w

• Stochastic gradient descent on

e w, x, y )

E ( w ) = (

E=0 means zero error.

(19)

Smooth activation functions are important for generalizing

the delta rule to multilayer

perceptrons.

Referencias

Documento similar

En segundo lugar, el Grado también interesará a todas aquellas personas que ya trabajan al servicio de las Administraciones Públicas pero a quienes la obtención de un