3.3.- La Legitimación en el Proceso de Amparo Indirecto Colectivo

In the hybrid DNN/HMM systems, the observation probability is true probability that satisfies the appropriate constraints. However, we may remove this constraint and replace the state log-likelihood with other scores. In the Kullback-Leibler divergence-based HMM (KL-HMM) [1,2] the state score is computed as

SKL(s, zt) = KL (ys zt) =

D d=1

y_s^dlny_s^d

z^d_t , (6.14)

where s is a state (e.g., senone), z^d_t = P (ad|xt) is the posterior probability of some class adgiven the observation xt, D is the number of classes, and ysis a probability distribution that represents the state s. Theoretically, adcan be any class. In practice

though, context-independent phones or states are typically chosen as ad. For example, zt can be the output of a DNN whose output neurons represent monophones.

Different from the hybrid DNN/HMM systems, in KL-HMM ysis an additional model parameter that needs to be estimated for each state. In [1,2] ysis optimized to minimize the average frame score defined in Eq.6.14while keeping zt(and thus the DNN or MLP) fixed.

Alternatively, the reverse KL (RKL) distance

SRKL(s, zt) = KL (zt ys) =

D d=1

z_t^dlnz^d_t

y_s^d, (6.15)

or the symmetric KL (SKL) distance

SSKL(s, zt) = KL (ys zt) + KL (zt ys) (6.16) can be used as the state score.

Note that, KL-HMM can be considered as a special DNN/HMM in which ad

serves as a neuron in a D-dimensional bottleneck layer in the DNN and the softmax layer is replaced with the KL distance. For this reason, when comparing the hybrid DNN/HMM systems with the KL-HMM systems an additional layer should be added to the DNN/HMM hybrid system for fair comparison.³

Compared to the simpler hybrid DNN/HMM systems, KL-HMM has other two drawbacks: the parameters ysare estimated separately from the DNN model instead of jointly optimized as in the hybrid systems and the sequence-discriminative training (which we will discuss in Chap.8) in KL-HMM is not as straightforward as that in the hybrid systems. For these reasons, this book focuses on the hybrid DNN/HMM system instead of the KL-HMM although it is an interesting model as well.

References

1. Aradilla, G., Bourlard, H., Magimai-Doss, M.: Using KL-based acoustic models in a large vocabulary recognition task. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 928–931 (2008)

2. Aradilla, G., Vepa, J., Bourlard, H.: An acoustic model based on kullback-leibler divergence for posterior features. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. IV–657 (2007)

3. Bahl, L., Brown, P., De Souza, P., Mercer, R.: Maximum mutual information estimation of hidden markov model parameters for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 11, pp. 49–52 (1986) 4. Bourlard, H., Morgan, N., Wooters, C., Renals, S.: CDNN: a context dependent neural network for continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 349–352 (1992)

3Unfair comparison was conducted in several papers that compare the hybrid DNN/HMM system and the KL-HMM system. The conclusions in these papers are thus questionable.

References 115

5. Bourlard, H., Wellekens, C.J.: Links between Markov models and multilayer perceptrons. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 12(12), 1167–1178 (1990)

6. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4688–4691 (2011)

7. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)

8. Godfrey, J.J., Holliman, E.: Switchboard-1 Release 2. Linguistic Data Consortium, Philadelphia (1997)

9. Godfrey, J.J., Holliman, E.C., McDaniel, J.: Switchboard: telephone speech corpus for research and development. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 517–520 (1992)

10. Hennebert, J., Ris, C., Bourlard, H., Renals, S., Morgan, N.: Estimation of global posteriors and forward-backward training of hybrid hmm/ann systems (1997)

11. Hermansky, H., Ellis, D.P., Sharma, S.: Tandem connectionist feature extraction for conven-tional HMM systems. In: Proceedings of Internaconven-tional Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, pp. 1635–1638 (2000)

12. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic model-ing in speech recognition: the shared views of four research groups. IEEE Signal Process.

Mag. 29(6), 82–97 (2012)

13. Hwang, M., Huang, X.: Shared-distribution hidden Markov models for speech recognition.

IEEE Trans. Speech Audio Process. 1(4), 414–420 (1993)

14. Kapadia, S., Valtchev, V., Young, S.: MMI training for continuous phoneme recognition on the TIMIT database. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 491–494 (1993)

15. Kingsbury, B., Sainath, T.N., Soltau, H.: Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH) (2012) 16. Kumar, N., Andreou, A.G.: Heteroscedastic discriminant analysis and reduced rank HMMs for

improved speech recognition. Speech Commun. 26(4), 283–297 (1998)

17. Morgan, N., Bourlard, H.: Continuous speech recognition using multilayer perceptrons with hidden Markov models. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 413–416 (1990)

18. Morgan, N., Bourlard, H.A.: Neural networks for statistical recognition of continuous speech.

Proc. IEEE 83(5), 742–772 (1995)

19. Ostendorf, M., Digalakis, V.V., Kimball, O.A.: From HMM’s to segment models: a unified view of stochastic modeling for speech recognition. IEEE Trans. Speech Audio Process. 4(5), 360–378 (1996)

20. Povey, D.: Discriminative Training for Large Vocabulary Speech Recognition. Ph.D. thesis, Cambridge University Engineering Department, Cambridge (2003)

21. Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4057–4060 (2008) 22. Povey, D., Kingsbury, B., Mangu, L., Saon, G., Soltau, H., Zweig, G.: FMPE:

discrimina-tively trained features for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 961–964 (2005)

23. Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discrimina-tive training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-105 (2002)

24. Robinson, A.J., Cook, G., Ellis, D.P., Fosler-Lussier, E., Renals, S., Williams, D.: Connectionist speech recognition of broadcast news. Speech Commun. 37(1), 27–45 (2002)

25. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Annual Conference of International Speech Communica-tion AssociaCommunica-tion (INTERSPEECH), pp. 437–440 (2011)

26. Senior, A., Heigold, G., Bacchiani, M., Liao, H.: GMM-free DNN training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014) 27. Su, H., Li, G., Yu, D., Seide, F.: Error back propagation for sequence training of

context-dependent deep networks for conversational speech transcription. In: Proceedings of Interna-tional Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)

28. Trentin, E., Gori, M.: A survey of hybrid ANN/HMM models for automatic speech recognition.

Neurocomputing 37(1), 91–126 (2001)

29. Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: Proceedings of Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning (2010) 30. Yu, D., Ju, Y.C., Wang, Y.Y., Zweig, G., Acero, A.: Automated directory assistance system-from

theory to practice. In: Proceedings of Annual Conference of International Speech Communi-cation Association (INTERSPEECH), pp. 2709–2712 (2007)

31. Zhang, B., Matsoukas, S., Schwartz, R.: Discriminatively trained region dependent feature transforms for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1,pp. I–I (2006)

32. Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: Tandem connectionist feature extraction for con-versational speech recognition. In: Machine Learning for Multimodal Interaction, vol. 3361, pp. 223–231. Springer, Berlin (2005)

Chapter 7

In document UNIVERSIDAD PANAMERICANA (página 128-133)