• No se han encontrado resultados

2 Marco teórico

2.2 Lenguaje escrito

2.2.2 Propuesta de Josette Jolibert

The feature selection approach that we propose in Section 4.1 is built on a powerful con- cept: Maximizing dependence between features and class labels. In fact, this principle allows us to define a unifying framework that subsumes many known feature selection al- gorithms. Furthermore, it can be transfered to other tasks in data mining and applications in bioinformatics.

5.4.1 Gene Selection via the BAHSIC Family of Algorithms

In [Song et al., 2007a], we show that the BAHSIC family of feature selection algorithms subsumes a whole battery of feature selectors known from the bioinformatics literature: Pearson’s correlation coefficient [van’t Veer et al., 2002, Ein-Dor et al., 2006], t-test [Tusher et al., 2001], signal-to-noise ratio [Golub et al., 1999], Centroid [Bedo et al., 2006, Hastie et al., 2001], Shrunken Centroid [Tibshirani et al., 2002, Tibshirani et al., 2003] and ridge regression [Li and Yang, 2005]. Due to the vast amount of different methods that have been defined, such a unifying framework can help to reveal their theoretical connection. Ultimately, by understanding the theoretical links between different feature selectors, we hope to understand why different gene selectors prefer different genes, and to be able to choose the best feature selector for a particular task based on theoretical considerations.

5.4.2 Dependence Maximization View of Clustering

The concept of maximizing dependence between features and class labels of data objects can be extended to other tasks in data mining. In clustering, class labels are assigned to data objects - such that dependence between their features and their labels is maximized! This is a novel view of clustering that we have recently begun to explore [Song et al., 2007b]. The fact that we maximize dependence in terms of a kernel matrix on the features and a kernel matrix on the labels creates a rich framework for expressing intra-dependencies between features and labels. In this fashion, we can design novel principled clustering algorithms. Clustering of microarray data is just one of the many potential applications of this technique in bioinformatics.

To conclude, based on our findings, we believe that graph kernel functions and kernel methods on graphs will be a key technique for exploiting the universality of graph models, and will significantly contribute to the advance of research in several areas of science, and in bioinformatics in particular.

Appendix A

Mathematical Background

A.1

Primer on Functional Analysis

Kernel methods borrow many concepts from Functional Analysis, as they compare objects in Hilbert spaces. In this section, we will define what a Hilbert Space is, starting from metric spaces and vector spaces, introducing norms, inner products, Banach spaces and their properties along the way [Sch¨olkopf and Smola, 2002, Garrett, 2004].

A metric space is a set imbued with a distance metric:

Definition 47 (Metric Space) A metric space M, d is a set M with a metric

d:M ×M →R such that for x, x0, x00 ∈M the following conditions hold:

d(x, x0)≥0 (A.1)

d(x, x0) = 0⇔x=x0 (A.2)

d(x, x0) =d(x0, x) (A.3)

d(x, x00)≤d(x, x0) +d(x0, x00) (A.4)

A Cauchy sequence in a metric space M is a sequence x1, x2, . . . with the property

that for every > 0 there is an N ∈ N sufficiently large such that for i, j ≥ N we have

d(xi;xj) < . A point x ∈ M is a limit of that Cauchy sequence if for every > 0 there is an N ∈N sufficiently large such that for i≥N we haved(xi, x)< . A subsetM0 of a metric spaceM is densein M if every point in M is a limit of a Cauchy sequence in M0. A metric spaceM iscompleteif every Cauchy sequence has a limit inM. A metric space

M is boundedif there exists some number r, such thatd(x, x0)< r for allx and x0 in M. A metric space M is compact if every sequence in M has a subsequence converging to a point in M. If a metric space has a countable dense subset, then it is called separable. Note that every compact metric space is separable.

Definition 48 (Vector Space) A setX is called a vector space (or linear space) over R if addition and scalar multiplication are defined, and satisfy (for all x, x0, x00 ∈ X, and

140 A. Mathematical Background c, c0 ∈R) x+ (x0+x00) = (x+x0) +x00, (A.5) x+x0 =x0+x∈X, (A.6) 0∈X, x+ 0 =x, (A.7) cx∈X, (A.8) 1x=x, (A.9) c(c0x) = (cc0)x, (A.10) c(x+x0) = cx+cx0, (A.11) (c+c0)x=cx+c0x. (A.12) We restrict ourselves to vector spaces over R, as these are of interest to us (definitions on

C are analogous).

Definition 49 (Normed Space) A normed space is a vector space X with a non- negative real-valued norm k · k : X → R+

0 with the following properties for x, x

0, x00 X and c∈R: kxk ≥0 (A.13) kxk= 0⇔x= 0. (A.14) kcxk=|c|kxk, (A.15) kx+x0k ≤ kxk+kx0k. (A.16)

When X has a normk·k, there is a metric naturally associated to it: d(x, x0) =kx−x0k.

A normed space X which is complete with the associated metric is said to be a Banach space.

To obtain a Hilbert space, we have to equip the vector space with an inner product.

Definition 50 (Inner Product) Let X be a vector space. A real-valued function h·,·i :

X×X →R of two variables on X is an inner product if

hx, x0i=hx0, xi (A.17)

hx+x00, x0i=hx, x0i+hx00, x0i (A.18)

hx, x0+x00i=hx, x0i+hx, x00i (A.19)

hx, xi ≥0 (and equality only for x = 0) (A.20)

hcx, x0i=chx, x0i (A.21)

hx, cx0i=chx, x0i (A.22)

where x, x0, x00 ∈X and c∈R.

An inner product defines a corresponding norm on X via

kxk=phx, xi

A.2 Primer on Probability Theory and Statistics 141