• No se han encontrado resultados

In this chapter, we proposed a novel Deep Neural Network-Based Joint Learning Model, which well combines Deep Autoencoder and Density Model.

The contributions of this method can be summarised as follows. (1) The generation and the classification networks are learned simultaneously; (2) it is trained in an end-to- end fashion that does not require pre-training of the autoencoder network, and (3) the parameter optimisation is more straightforward than the previous methods. However, the singular matrix problem caused by the density estimation still needs to be solved. To this end, how to embed the density estimation into deep neural networks through the representation of some hidden layers will be the focus of our future work. Moreover, further applications can be extended to, such as anomaly detection and out-of-distribution detection. Finally, how to embed the infinite mixture model into a deep neural network is also one of our concerned topics.

Chapter 7

Conclusion and Future Work

In this chapter, the conclusion of this thesis is provided. First, the whole journey of this dissertation will be briefly reviewed, which starts from the challenges of density learning in the literature. Motivated from these challenges, we study three types of structured latent variables including the finite latent variables, the infinite latent features, and deep- structured latent variables. These study eventually leads to different models as discussed in this thesis. Following that, we then present future perspectives within the intended framework.

7.1

Review of the Journey

This dissertation is focused on learning density models with structured latent variables typically from high-dimensional data. In the literature, five major how-to concerns are outlined: how to capture the crucial attributes in lower-dimensional space, how to reduce the free parameters, how to estimate distributions to fit the complicated data manifold, how to increase model flexibility, and how to reduce the learning difficulties. In this dissertation, these challenges are managed by constructing the different latent variable structures of density models. The structured latent variables can correspond to different concepts by assuming the different density distributions on latent variables. We have reviewed the state-of-the-art topics in the areas of finite mixture models, infinite latent features models, and deep models. More importantly, we have made our contributions in each of these topics.

Based on the finite latent variable structures, we have introduced several joint learning models which are further applied in both clustering and classification. First, by design- ing a common loading matrix, a finite mixture model is proposed managing to perfor- m learning with dimensionality reduction simultaneously. It is noted that the common loading matrix can be regarded as a global dimensionality reduction matrix, making the effective low-dimensionality representations can be captured and calibrated for the sub-

sequent learning tasks. One additional advantage is that the proposed model can reduce significantly the free parameters as used in the traditional finite mixture model. Then, we propose a two-layer mixture model with a global loading matrix for discriminant analysis. This is a mixture of mixtures structure which is used to capture the complex properties of each class better. The approach has been validated on both synthetic datasets and real-world datasets including the benchmark clustering datasets and the small sample size datasets. The performance of the proposed joint learning models is demonstrated to sig- nificantly outperform the separated learning models in clustering. Also, this two-layer mixture model with a global loading matrix leads to the best results compared with other mixture component models in classification, when the sample numbers of each class are limited.

In order to address the limitation that certain parameters need to be pre-specified in many finite mixture models, an infinite latent feature model is proposed. The non- parametric prior (IBP prior) is involved for improving the flexibility of the infinite model, which can automatically determine an optimal number of features. Meanwhile, we have contributed a tri-factorisation framework to reveal the latent structures among items (sam- ples) and attributes (features). This model also delivers latent binary features needing no extra constraints. An efficient optimisation algorithm is also developed accordingly. In the experiments, the proposed infinite latent feature model significantly outperforms the other four competitive algorithms on various tasks including reconstruction, pre-image restoration, and clustering. Moreover, a series of experiments on feature extraction have demonstrated that the proposed tri-factorisation model has superior ability to extract both the latent structures and the features particularly from the data with complex structures.

Finally, we have introduced two models via deep-structured latent variables: a layer- wise-based model, and a deep autoencoder-based model. Both the deep models are pro- posed with the purpose of fitting the complicated data manifold as well as alleviating the learning difficulty. The first deep density model adopts a greedy layer-wise learning ap- proach exploiting the same scheme to train each hidden layer. Its inference and parameter computation procedure are more straightforward than previous methods. This model is evaluated on empirical tasks including clustering and generation on real-world datasets. The results show that the proposed model achieves better performance compared with those standard and state-of-the-art layer-wise-based methods. The other deep density model employs a deep autoencoder as a dimensionality reduction procedure and tries to optimise the parameters jointly. The deep autoencoder is more powerful than a simple multi-linear projection in finding low-dimensional representations. Importantly, this is an end-to-end model that does not require the pre-training of the autoencoder network. It can also be straightforwardly optimised with the standard backpropagation. In the experi-

ment, this model was evaluated on reconstruction, generation, classification, and rejection tasks. The results show that, in the case of insufficient data, our previous works are more applicable. In addition, the proposed model has demonstrated outstanding performance compared with the general deep autoencoder and convolutional neural network.

Documento similar