2.2. Análisis administrativo
2.2.2. Proceso de selección y manejo del recurso
7.1 Summary and Conclusions
The work presented in this study represents some promising strides to- wards the solution of the cocktail party problem (CPP) within the blind source separation (BSS) framework. The aim was mainly to add some novel contributions to enhance the performance of the independent vec- tor analysis (IVA), including its different versions, in separating speech sources from their observed mixtures in real reverberant environments. The main challenge to blind audio source separation (BASS) is the convolutive mixing of the sources in real room environments. This necessitates conducting the process in the frequency domain (FD) to avoid the computational complexity of the convolution operation in the time domain. In Chapter 2 background theory and fundamentals related to the subject of convolutive blind source separation (CBSS) were introduced. It also highlighted previous related work within the topic and their limitations. Independent component analysis (ICA), a prominent FD-BSS technique, was discussed which led to the per- mutation problem in FD-BSS. Then, the independent vector analysis
Section 7.1. Summary and Conclusions 198
(IVA) algorithm, based on an improved model of the ICA method to address the permutation problem inherent to ICA, was reviewed in its natural gradient form (NG-IVA) and fast fixed point form (FastIVA). The heart of the IVA method is the multivariate source prior used to model the speech signals because the non-linear score function used to retain the inter-frequency dependency is obtained from the probability distribution function (PDF) of the source prior.
In Chapter 3, techniques and settings related to the implementation and evaluation of speech and blind audio source separation (BASS) sys- tems were outlined. Different experimental setups were described, in- cluding information on datasets for speech sources, room environments and models as well as the performance parameters used in the evalu- ation criteria. The separation performance of the different algorithms was mainly measured objectively by signal to distortion ratio (SDR) in dB [76] or subjectively by perceptual evaluation of speech quality (PESQ) (on a scale of 0-4.5) [81] in simulated [71] and binaural real room impulse responses (BRIRs) [74, 75].
The contributions of this thesis satisfy the research objectives out- lined in the introduction chapter. The objectives were addressed by introducing new methods to enhance the performance of the IVA al- gorithm in its various forms. The contributions can be summarised as follows:
1. A new multivariate Student’s t distribution the source prior for the batch IVA algorithm.
2. A novel energy driven mixed distribution model as a source prior for the batch IVA algorithm.
3. A particular multivariate generalized Gaussian distribution as the source prior for the online IVA algorithm.
4. A novel adaptive learning scheme to improve the performance of the online IVA algorithm.
5. A novel switched source prior technique for the adaptive learning online IVA algorithm.
In Chapter 4, a new multivariate Student’s t distribution is pro- posed as the source prior for the batch IVA algorithm. A Student’s t PDF can better model certain frequency domain non stationary speech signals due to its tail dependency property. The tails of the distribution can be tuned to closely match the generally heavy tail distribution of the frequency domain speech signals due to the high amplitude data points. The chapter, initially, provided an experimental comparison be- tween the batch versions of ICA and IVA. The results demonstrated the poor performance of the standard ICA due to the permutation prob- lem and the IVA directly addresses the problem. Then, the separation performance of the IVA algorithm with the new source prior is com- pared with the original super Gaussian source prior in simulated and real room environments with a variety of settings. The experimental re- sults confirmed that the proposed Student’s t source prior consistently improves the separation performance of the IVA algorithm.
Using simulated room impulse responses [71], the average recorded SDR improvement using the new Student’s t source prior was approxi- mately 0.75 dB compared with the original IVA method. In real highly reverberant environment [74], the average recorded SDR improvement was approximately 1.31 dB compared with the original IVA method.
Section 7.1. Summary and Conclusions 200
This confirms the suitability of the Student’s t distribution to model speech signals in real life scenarios. The subjective study confirmed the improved separation performance for the IVA method with the Stu- dent’s source prior. The average separation performance improvement PESQ score was approximately 0.75.
In Chapter 5, a novel multivariate source prior for the IVA algo- rithm was introduced. The proposed source prior is a mixture of two distributions, instead of a single distribution; namely the original mul- tivariate super Gaussian distribution and the multivariate Student’s t distribution. Human speech is highly non stationary with variable am- plitude components. In the proposed mixed source prior, the Student’s t distribution models the high amplitude components and the origi- nal super Gaussian distribution is used to model the lower amplitude components of the speech signal. Firstly, equal weights were assigned to both the original super Gaussian distribution and the Student’s t distribution in the mixed source prior. Then, it was further enhanced with an energy driven scheme that adjusts the weight of each distri- bution according to the normalised energy of the observed mixtures at the frequency domain blocks of a clique based dependency model. As a results, the mixed source prior was able to adapt to different statistical properties of speech signals.
The fixed mixed source prior was adopted for the IVA and the Fas- tIVA algorithms and compared with the original single super Gaussian source prior. The detailed experimental studies using simulated [71] and real room environment [74] with different reverberation times con- firmed consistent separation performance improvement of the fixed mixed source prior based IVA. Table 7.1 shows the approximate av-