• No se han encontrado resultados

S ITUACIÓN FINAL ( ORGANISMO CONDICIONADO )

Indique ahora ejemplos similares para situaciones de aprendizaje que se presentan en su ambiente de trabajo.

S ITUACIÓN FINAL ( ORGANISMO CONDICIONADO )

While Theorem 6.7 is interesting in relating agnostic PAC learning to learning a single hidden layer neural network, there do not appear to be many basis function classes which are properly efficiently agnostically PAC leamable. Available results show the hardness of properly agnostically PAC learning monomials and halfspaces under the assumption RP ^ NP (Keams et al. 1994, Hoffgen & Simon 1992). This implies that for networks of functions from these classes, it is unlikely that an efficient algorithm can be obtained from the approach given here. Since the quadratic loss is equivalent to the discrete loss when the function class as well as the target functions are {0,1 }- valued, the approach given in Section 6.2 for properly learning networks by properly learning the basis functions with the quadratic loss is also unlikely to produce an efficient agnostic learning algorithm for these function classes. However, these results do not rule out efficient agnostic learning using other methods or other hypothesis classes. To do that requires representation independent hardness results.

In (Keams et al. 1994), it was shown that if the class of monomials is efficiently agnostically leamable (with any hypothesis class) with respect to the discrete loss function, then the class of polynomial-size DNF is efficiently leamable in the PAC learning model. (It is generally believed that polynomial-sized DNF is not likely to be efficiently leamable (Jerrum 1994).) Using techniques similar to that in (Keams et al. 1994), it is possible to show that if a class of {0,1}- valued basis functions include monomials, then an efficient agnostic leaming algorithm for the class using the quadratic loss function can be used to efficiently find a randomized hypothesis for polynomial-sized DNF. (We say a hypothesis h is randomized if there exists a probabilistic polynomial time algorithm that, given h and an instance v, computes h's prediction on v.) If we

assume that it is hard to find a learning algorithm for DNF, then agnostically learning such basis function classes as well as the network of the basis functions is hard.

The idea behind the proof is to show that the network can be used as a weak PAC learning algorithm for learning p(n)-term DNF. The result then follows from the fact that a {0, l}-valued function class is efficiently PAC leamable if and only if it is efficiently weakly PAC leamable (Schapire 1990).

Definition 6.8 The class of monomials over n Boolean variables x\,. ..,Xn consist of all conjunc- tions of literals over the variables. For any k, the class of k-term DNF consist of all disjunctions of the form Mi V . . . V M^; where each M is a monomial.

Definition 6.9 Let Q be a class of functions mapping from X to {0,1}. Suppose Q is parametrized by complexity parameter n. Then Q is efficiently weakly PAC leamable if there exists a polynomial p and an algorithm A such that for all n > 1, for all target functions g E Q, for any probability distribution D on X, and for all 0 < d < 1, algorithm A, given the parameters n and 5, draws instances from D labelled by g, runs in time polynomial in n and 1 /S, and outputs a hypothesis h that with probability at least 1 - 5 has expected error no more than 1 / 2 - 1 /p{n).

Theorem 6.10 ((Schapire 1990)) Let Q be a function class mapping from X to {0,1}. Q is efficiently weakly PAC leamable if and only if it is efficiently PAC leamable.

For any function /i : A' -)• [0,1], define $h{x) to be a boolean random variable that is 1 with probability h{x) and 0 with probability 1 - h{x). We will need the following result.

Lemma 6.11 ((Kearns et al. 1994)) Let f : X {0,1} any boolean function, and let h : X [0.,\] be a real-valued function. Then for any distribution D on X

?r{f{x) # $h{x)) < E [ ( / ( x ) - h{x)f] + 1/4.

Theorem 6.12 Let Q = U^i where each Qn is a permissible class o/{0, \ }-valued functions on such that the class of monomials is a subset of and letp{n) be any polynomial in n. If Q is efficiently agnostically leamable with respect to the quadratic loss function, then there exists an efficient algorithm (which produces randomized hypotheses) for learning p{n)-term DNF

Proof. We will show that there exists a weak learning algorithm (which produces randomized hypotheses) forp(n)-term DNF. The result then follows from Theorem 6.10.

For any target p ( n ) - t e r m D N F formula, there exists a monomial that never makes an error on a negative example and gets at least 1 / p ( n ) of the positive examples right (because the p ( n ) terms cover all the positive examples). Let uj ^ Q he equivalent to this monomial when restricted to {0,1}'^. Then cj' = 5(0; + 1) G J\ff will have quadratic error 1 / 4 on the negative examples. On the positive examples the quadratic error of u ' will be zero when the monomial to gives the correct classification and 1 / 4 when it gives the wrong classification.

The algorithm for producing the randomized hypothesis goes as follows. (The constants are chosen for convenience.) Assume that the probability that an instance is labelled 1 is a . Draw a large enough sample (using e.g. Theorem 6.2) so that with probability at least 1 —5/2, the empirical average a is within e / 2 of a for some small e < 7 / ( 3 2 p ( n ) ) . If the empirical average is less than 1 / 4 + e / 2 , choose the all zero monomial. If the empirical average is more than 3 / 4 — e / 2 choose the the all one monomial. Either of these hypotheses will then have error no more than 1 / 4 + e. Otherwise the probability of a positive example is between 1 / 4 and 3 / 4 . We then use the agnostic learning algorithm to learn the function using A/'f with quadratic loss. From Section 6.2, Mf is efficiently agnostically leamable if Q is efficiently agnostically leamable. The above argument shows that there exists a function in J\ff with expected quadratic error less t h a n i ^ l - ^ ^ + i ^ l - i ) = i - Let / be our target DNF. Use the agnostic algorithm to produce a hypothesis h which is no more that l / ( 3 2 p ( n ) ) away f r o m the optimum with probability at least I - 5/2. Then f r o m L e m m a 6.11, we have PT[f{x) # $h{x)] < E[{f{x) - h{x))^] + 1 / 4 < 1 / 2 - l / ( 3 2 p ( n ) ) , where $/,(a;) is a boolean random variable that is 1 with probability h{x) and zero with probability l-h{x). The probability of the algorithm failing to produce a hypothesis with error less than 1 / 2 - 1 /{32p{n)) is no more than S. Hence the algorithm is a weak learning algorithm which produces randomized hypotheses for learning p ( n ) - t e r m DNF. •

It is easy to see that the result also holds in the logarithmic cost model of computation (Aho et al. 1974) because the second layer weights of the neural network are fixed at 1 /k, where k is the number of hidden units.

6.5 Learning Bounded Fan-in Neural Networks

From the result in the previous section, it would appear that agnostically learning a single hidden layer neural network with linear threshold hidden units is likely to be computationally difficult. In

this section, we consider learning a computationally tractable subclass: the class of single hidden layer neural networks with bounded fan-in.