• No se han encontrado resultados

F. RESULTADOS ESPERADOS

1.6.1 EXTREMIDADES SUPERIORES

2.7

Conclusion

In this chapter, we have scrutinised the a number of machine learning algorithms in the field of dimensionality reduction. These generic algorithms lay the foundations for developing new embedding approaches and their key ideas are repeatedly used and developed in the whole field of machine learning. We summarise their main ideas here: a)PCA uses an orthogonal transformation to convert a set of observation variables into a set of variables called principle components. In this procedure, PCA aims to maintain as much of thevariabilityin the data as possible. b)LE, LLE and SNE are alllocalitypreserving methods. They differ from one another in their ways of encoding the locality information, i.e., making the use of neighbourhood graphs, local linear structures or probability distributions.c)CCA explores relationships between two sets of multivariate variables, and maximises thecorrelationbetween pairs of transformed versions of these variables. d) An RBM is a instance of generative

stochastic models. It defines a parametric probability distribution over its set of

inputs and the learning is carried out by minimising the mismatch between the input and the probability distribution.

We have come across the trace optimisation problem on numerous occasions (such as in Eqs. (2.1.9, 2.2.7, 2.3.9, 2.5.4)). This problem is important to almost all the matrix factorisation methods, and due to its convexity the analytic form of solution can be given directly. In this chapter, we have closely studied various forms of this trace optimisation problem and clarifying its mathematic formulation and solution will certainly be helpful to new algorithm designs (e.g., properly imposing constraints).

From next chapters we will present our consecutive works in order. It should be noted that our proposed methods in Chapters 3 and 5 rely heavily on the formulations of LE and SNE, respectively. It is encouraged to compare them with the LE and SNE demonstrated in this chapter.

Chapter 3

Heterogeneous Object

Co-embeddings from Relational

Measurements

3.1

Introduction

In Chapter 2, we have reviewed some of the conventional embedding methods that build the low dimensional mappings from the high dimensional feature repre- sentations. Such methods are valuable tools for data preprocessing, data analysis and information visualisation. However, these techniques only embed homogeneous (i.e., of a single type) data objects into a low-dimensional space given their higher dimensional feature representations. While in many real-world applications, data may come from heterogeneous sources, such as genes and symptoms, documents and words or images, review articles from different domains. It could therefore be useful to simultaneously handle heterogeneous types of data, by mapping them into a single common space.

Various data processing methods have been proposed to address the problem of handling heterogeneous types of data. Examples include methods targeting specific applications, such as biological networks [83, 84], semantic analysis [64, 85] and in- formation retrieval [86, 87]. Heterogeneous data analysis has also been performed by

3.1 Introduction 38 more generic methods. For instance, Correspondence Analysis (CA) [88] represents the rows and columns of a data matrix as points in a space of low-dimensionality. Latent Semantic Indexing [89] is a popular information retrieval embedding method, frequently used to embed documents and words in a common space [90]. CCA [91] attempts to maximise the correlation between two sets of measurements. Similarly, variations of nonmetric Mutidimensional Scaling [64] have been used to place the corresponding reference data as close as possible, so that the patterns are aligned in the common space. More recent methods [92] can learn the joint representation from multiple datasets that lie on multiple manifolds. However, most of these techniques require the availability of pattern information from the different data representations.

The heterogeneous embedding problem considered in this work, only assumes the existence of a relational similarity matrix (correspond to the bipartite relations in Section 1.1.1) between two sets of objects of possibly differing cardinality. This is also known as joint embedding or co-embedding [28, 86, 93]. The goal is to generate co-embeddings, where both groups of objects are embedded in a joint space. Various stochastic methods have been previously proposed to achieve this, such as Parametric Embedding [93], Co-occurrence Data Embedding (CODE) [63], Bayesian Co-occurrence Data Embedding [86], as well as a dynamic embedding model that processes a sequence of co-occurrence data changing over time [94]. These algo- rithms treat the co-occurrence object pairs as being generated by a Gaussian mixture in the embedding space, and then recover the embedding that maximises the like- lihood of the observed data. An alternative strategy for computing co-embeddings from similarities between heterogeneous objects is Automatic Co-embedding with Adaptive Shaping (ACAS) [28] based on matrix factorisation, which generalises ideas from embedding algorithms such as [88, 89, 95, 96], and controls the factors that generate different shapes and distributions of column and row objects in the common space. There are also methods that are specialised at learning embeddings from a binary relation matrix between two groups of objects. For instance, Maximum- Margin Matrix Factorization [97] attempts to fit a binary target matrix with a low-rank inner product matrix between the embedding vectors of the row and column objects. Another method estimates the data distribution of the row and column objects from

3.2 Related Methods 39