4. La ideología política de los excombatientes
4.3 El problema generacional
In this chapter, we introduced a new regularized NMF algorithm for single channel source separation. The energy independent prior GMM was used to force the NMF solution to satisfy the statistical nature of the estimated source signals. The gains found in NMF solution were encouraged to increase their likelihood with the prior gain models of the source signals. Gaussian mixture models were used to model the log- normalized gain prior to improve the separation results. Our experiments indicate that
the proposed approach is a promising method in single channel speech-music and speech- speech separation using various target-to-background energy ratios and different NMF divergence functions.
Regularized NMF using HMM
priors
4.1
Motivations and overview
The NMF solutions in Section2.3do not consider the temporal information between the consequent frames in the spectrogram. The temporal information between the frames is important information that can be used to improve any audio signal processing system. In Chapter3, Gaussian mixture model (GMM) was used as a prior that guides the NMF solution of the gains matrix to get better solution for the NMF cost function. A better solution means a solution that is more compatible with the nature of the source signals. GMM models the columns of the trained gains matrix without considering the dynamic structure of the processed audio signals. GMM treats the columns of the trained gains matrix independently from each other. The temporal structure is important information that needs to be considered when we model any audio signal.
In this chapter, we try to guide the solution of NMF during the separation stage to consider temporal and statistical prior information. The columns of the trained gains matrix represent the valid gain combination sequences for a certain type of source sig- nal. The gains matrix can be used to train a prior model for the valid weight pattern sequence for each source. The prior models can guide the NMF decomposition weight- s/gains during the separation stage to find a solution that can be considered as valid weight combination sequences for the underlying source signal while minimizing the
NMF reconstruction error. The trained gains matrix is used here to build a HMM prior model for each source.
Figure 4.1 shows an example similar to Figures 2.1 and 3.1 where the clustering and temporal structures of the nonnegative linear combinations of the given two basis vectors can be seen. The possibility of staying at the same cluster or moving to another cluster is considered in this figure which raises the need for an HMM to model the shown data. This description is for a simplified case where each cluster corresponds to a single state in the HMM model, or in other words HMM state emission distributions are single Gaussian distributions.
Figure 4.1: The cluster and temporal structures for the nonnegative linear combina- tions of the basis vectors.
Since the trained basis vectors are the same during the training and the separation stage, we believe these clustering and temporal structures will be inherited in the gains matrix. We conjecture that the sequence of columns in the gains matrix can be considered as a sequence of features extracted from the signal so that it can be modeled well with a HMM. HMM is used extensively in speech recognition to model time-series signals. The columns of the trained gain matrices are normalized by the `2 norm, and their logarithm is taken and used to train the prior HMM for each source. The trained basis matrix and the prior HMM are jointly used as representative models for the training data for each source. As in Chapter 3, training the basis matrix and the prior model can be
done either in two steps sequentially, or all model parameters can be learnt using joint training.
From the previous chapter we can see that, using joint training gives better results than using sequential training. To avoid repetitions, we will only consider using joint training. We use sequential training for initializing the model parameters, then we use joint training to learn the model parameters. In the separation stage and after observing the mixed signal, NMF is used to decompose the spectrogram of the mixed signal as a weighted linear combination of the columns of the trained basis matrices. The sequence of the decomposition weight combinations are jointly encouraged to increase the log- likelihood with their corresponding trained prior HMMs. The solution that decreases the NMF construction error and increases the log-likelihood with the prior HMMs is computed from solving a regularized NMF cost function. The proposed algorithm models the prior information using HMM, which is a rich model to represent the statistical distribution of any sequential training data. Temporal relations between sequential frames are also modeled in the HMM using transition probabilities among states. Since the HMMs are trained using normalized data, there is no restriction on the energy level of the testing data compared to the training data. Moreover, the source signals can have different energy levels in the mixed signal without any limitations. In the previous chapter, GMM priors improve the separation performance regardless of the used NMF cost function. To avoid repetition as in previous chapter, we will apply HMM priors on the IS-NMF solution only.