In this chapter, we have considered Hyperspectral Unmixing as a feature learning prob- lem. In particular, we have utilized the BNFL framework, in which we incorporated the assumptions on the endmembers and abundances. A remarkable advantage in com- parison to existing unmixing algorithm is that the proposed method allows to infer the abundances, the endmembers and the number thereof jointly. We propose the use of PT to alleviate the problems induced by the multi-modal posterior distribution. The simulations experiments show that the desired quantities can be reliably inferred. In case of low SNR, the proposed algorithm tends to overestimate the number of endmem- ber slightly, producing noisy replica of present endmembers. Applying the algorithm to real hyperspectral images results in accurate estimates with results that are in line with those reported in previous work.
83
Chapter 5
Bayesian Feature-Based Learning From
Demonstrations
In this chapter, we extend the BNFL framework for the use in decision-making. We assume that the states of the agent are composed of several latent features, which determine the action the agent takes. The benefits of this approach are two-fold: first, the states can be represented compactly, rendering the decision-making problem much more efficient as compared to working in the original domain. Second, the features can be regarded as causes for the observed decisions, allowing for a deeper understanding of the observed behavior.
This chapter is organized as follows. In Section 5.1, we introduce and motivate concepts for Learning From Demonstrations (LFD). We highlight the contributions of this chapter in Section 5.2 and formulate the problem in Section 5.3. Afterwards, we give an overview of the state of the art of feature learning in decision-making in Section 5.4. In Section 5.5, we motivate and present the model for feature-based decision-making. A Bayesian model is then detailed in Section 5.6 and we explain how the number of latent features can be inferred from the data. Section 5.7 explains the inference scheme for the latent variables. The proposed algorithm is empirically evaluated with simulation experiments in Section 5.8, demonstrating the performance of the algorithm. Real data experiments are conducted in Section 5.9. We discuss our findings in Section 5.10 and end with a short conclusion in Section 5.11.1
5.1
Motivation and Introduction
Decision-making plays a crucial role in many applications, such as robot learning, driver assistance systems, and recommender systems [174]. A fundamental question in decision-making is how an agent can learn to make optimal decisions. Learning from an experienced teacher provides a natural means to solve this problem, without the need of explicitly defining rules for the desired behavior. Further, this approach naturally avoids risky states, a significant problem in Reinforcement Learning (RL)
1This chapter has served as basis for the journal article:
J. Hahn and A. M. Zoubir, “Bayesian Nonparametric Feature and Policy Learning for Decision- Making”, submitted to Pattern Recognition, 2016.
84 Chapter 5: Bayesian Feature-Based Learning From Demonstrations
approaches that we investigate in [175]. Further, observing a teacher may provide a deeper understanding of the decision-making process. Therefore, LFD [4] has gained a lot of interest in the recent past.
According to [4], approaches for LFD can be grouped into (i) reward-based models and (ii) imitation learning. In reward-based models, it is assumed that the agent makes its decision based on a reward which is, in the context of LFD, learned from observations (as in Inverse Reinforcement Learning (IRL) [176]). Inferring the reward can be challenging and involves solving a MDP, usually by means of a RL algorithm. We proposed an Expectation Maximization framework for IRL in [177]. In imitation learning, it is assumed that an experienced teacher can be observed. Thus, the policy, telling the agent how to act in a given situation, can be learned by understanding the direct relation between the teacher’s states and actions. Especially in reward- based approaches, problems defined on infinite spaces, where the state of an agent can take continuous values, are mostly intractable to solve and efficient approximations are needed. Only in the case of finite state and action spaces and known rewards, learning optimal policies has been solved [178]. A typical approach to facilitate this problem is the representation of the state space in a feature domain, c.f. [179, 180].
However, decision-making has been viewed only from a limited feature-based perspec- tive, where features are usually designed by experts and mainly serve the purpose of reducing the size of the state space. We argue that, in many practical systems, the agent makes its decision based on a compact representation of the observed data, which can be considered as a projection onto a feature space of the decision-making problem. As an example, consider a person driving a car. The driver’s observations consists of, i.a., the location, speed, and acceleration of his vehicle and the surrounding vehicles, the type of the road and the weather conditions. However, the driver makes his decision based on a subset of the available information, e.g., on the time-to-reach between his and the other road users’ cars. This idea has also been investigated from a psychologi- cal point of view by the concept of discovering latent causes in human behavior, which is related to learning state space representations [181].
5.2
Contributions
Assuming that a certain structure underlies the observations, we aim at inferring the latent features and build a feature-based representation of the states, yielding the fol- lowing two advantages: first, the states can be represented compactly, rendering the decision-making problem much more efficient as compared to working in the original