Valoraciones y propuestas Población residente

It is clear that each hypothesis has specific properties, and in general each hypothesis space provides different properties. In this regard, the flexibility (capacity) of each hypothesis space can be studied. Here the flexibility of a hypothesis space can be seen as the ability to provide flexible hypotheses. In order to quantify the so-called capacity of one classifier Vapnik proposed in [108] the concept of the VC dimension. Roughly speaking, the VC dimension is the maximum number of instances, in which the classifier can classify those instances with respect to arbitrary labels without any mistakes. Before going into details, the concept of shattering should be introduced:

From a binary classification point of view, given n instances {x1, . . . ,xn} ⊂ Rm there are 2n ways to assign labels {−1,+1}to these instances. The instances

{x1, . . . ,xn} are said to be shattered by the model class H if, for all possible la- beling (2n cases), there exists at least one model from model class H which can classify the instances without any error. So the largest number of instances, which can be shattered by model classHis called the VC dimension of model class. More formally, based on the Vapnik and Chervonenkis, the VC dimension of model class

F is defined as follows:

maxn|X|

X ⊂ X, ∀g ∈ {−1,+1}X_,_∃_h_{∈ F}

such that∀x∈X, h(x) =g(x)o .

By way of example, for a model class of linear functions with m variables, the VC dimension is equal tom+ 1. In Figure 2.1 all existing label assignments and corresponding separations are shown. If for a model class the VC dimension is un- bounded, then the VC dimension is infinite.

Loosely speaking, the VC dimension reveals the flexibility of a model class, the higher the VC dimension, the more flexible the model class is.

Figure 2.1: The illustration of shattering of three instances for model class linear functions with two variables

Structural Risk Minimization (SRM)

Under empirical risk minimization, two well-known problems can occur during the learning process. They are called, overfitting and underfitting. The overfitting problem refers to when the capacity (complexity) of the learner clearly is higher than what is required. Likewise, the underfitting problem occurs when the capacity (complexity) of the learner is clearly lower compared to what indeed is needed. In order to overcome this problem, or let say, find the proper learner, Vapnik proposed the idea ofstructural risk minimization. Assume a family class of learners is given. Moreover, assume there is a possibility to order the learners based on their complexity, e.g. VC dimension. The main goal under structural risk minimization is to find a trade-off between the complexity of the learner and the goodness of generalization. More formally, the goal is to minimize

Remp(w) +λCP(w) ,

whereRemp is referred to the empirical risk, CP is referred to complexity penalty and finally λ is the trade-off parameter, which is determined empirically. In this regard, Vapnik proposed a bound to show a sound dependency between the risk and empirical risk given the VC dimension of the model.

Theorem 2.1 (Vapnik) AssumeHis the class of functions, with a VC dimension of

this distribution, the following inequality is valid with probability1−η. ∀h∈ H, R(h)≤Remp(h) + s v(log 2_vn+ 1)−log(n₄) n + 1 n . (2.4)

More formally, assume there is a possibility to order the hypotheses in hypothesis space as follows:

H0 ⊂ H1 ⊂ H2 ⊂. . .⊂ H ,

whereH=∪∞

i=0Hi. Moreover assume the VC dimension of eachHi is equal tovi. It is clear that

v0 < v1 < v2 < . . .

In general, choosing the hypothesis with a high VC dimension, due to high flexibility reduces the empirical risk, while strengthening the overfitting problem. On the other hand, choosing the hypothesis with a low VC dimension reduces the flexibility and hence increases the empirical risk. The core idea under SRM, is to find a trade-off (see Figure 2.2) between the complexity of the hypothesis and the quality of fitting in order to reduce the generalization error (R(h)) as much as possible. Note that based on the Vapnik theorem, choosing a hypothesis with high flexibility increases the second term in the right hand side of the equation (2.4).

VC Dimesnion Training Error Complexity E rr o r

Figure 2.2: The illustration of structural risk minimization, showing the trade-off between the complexity and the quality of fitting

The Concept of Regularization

In last part it was discussed that the higher flexibility of the learner increases the chance of the overfitting problem occurring. To this end, the core idea of SRM is to find a trade-off between the complexity of the learner and the quality of generalization in a proper way. To reduce the complexity of the learner, the parameters of the learner are restricted. To this end, the idea of regularization comes into play. The idea is to consider follwoing risk:

Rreg(f) =Remp(f) +λΩ(f) ,

wheref refers to a learner. In addition, the functionΩ(·)measures the regularity. HereRregis called regularized risk.

In document El trabajo en Barcelona 2015 (página 133-149)