Enriquecimiento ilícito - Informe sobre el examen del Perú

Even though Bi-LMs are language models, they act more as translation models as they do not model the fluency of target language but model the translation of source words. Based on this idea, we would like to extend the idea of using word embeddings in translation model. In phrase-based SMT, the translation model consists of phrase pairs. One way to modify the translation model to include embeddings would be to have a translation model that contains phrase embedding pairs instead of words in phrases. We would like to test

this new embeddings based translation model as a standalone translation model and also as an additional translation model that complements the standard word based translation model.

Bibliography

[1] Alexandre Allauzen, Josep M. Crego, İlknur Durgar El-Kahlout, and François Yvon. Limsi’s statistical translation systems for wmt’10. In Proceedings of the Joint Fifth

Workshop on Statistical Machine Translation and MetricsMATR, WMT ’10, pages

54–59, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.

[2] Waleed Ammar, Victor Chahuneau, Michael Denkowski, Greg Hanneman, Wang Ling, Austin Matthews, Kenton Murray, Nicola Segall, Alon Lavie, and Chris Dyer. The CMU machine translation systems at WMT 2013: Syntax, synthetic translation op- tions, and pseudo-references. In Proceedings of the Eighth Workshop on Statistical

Machine Translation, pages 70–77, Sofia, Bulgaria, August 2013. Association for Com-

putational Linguistics.

[3] Necip Fazil Ayan, Bonnie J. Dorr, and Christof Monz. NeurAlign: Combining word alignments using neural networks. In Proceedings of Human Language Technology

Conference and Conference on Empirical Methods in Natural Language Processing,

pages 65–72, Vancouver, British Columbia, Canada, October 2005. Association for Computational Linguistics.

[4] Taylor Berg-Kirkpatrick, Alexandre Bouchard-Côté, John DeNero, and Dan Klein. Painless unsupervised learning with features. In Human Language Technologies: The

2010 Annual Conference of the North American Chapter of the Association for Com- putational Linguistics, pages 582–590, Los Angeles, California, June 2010. Association

for Computational Linguistics.

[5] Arianna Bisazza and Christof Monz. Class-based language modeling for translating into morphologically rich languages. In Proceedings of COLING 2014, the 25th Interna-

tional Conference on Computational Linguistics: Technical Papers, pages 1918–1927,

Dublin, Ireland, August 2014. Dublin City University and Association for Computa- tional Linguistics.

[6] Phil Blunsom and Trevor Cohn. Discriminative word alignment with conditional ran- dom fields. In Proceedings of the 21st International Conference on Computational

Linguistics and the 44th Annual Meeting of the Association for Computational Lin- guistics, ACL-44, pages 65–72, Stroudsburg, PA, USA, 2006. Association for Compu-

tational Linguistics.

[7] Phil Blunsom and Trevor Cohn. A hierarchical pitman-yor process hmm for unsu- pervised part of speech induction. In Proceedings of the 49th Annual Meeting of the

HLT ’11, pages 865–874, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.

[8] Jordan L. Boyd-Graber and David M. Blei. Multilingual topic models for unaligned text. CoRR, abs/1205.2657, 2012.

[9] Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. Class-based n-gram models of natural language. Comput. Linguist., 18(4):467–479, December 1992.

[10] Peter F Brown, Vincent J Della Pietra, Stephen A Della Pietra, and Robert L Mercer. The mathematics of statistical machine translation: Parameter estimation. Computa-

tional linguistics, 19(2):263–311, 1993.

[11] Francisco Casacuberta and Enrique Vidal. Machine translation with inferred stochastic finite-state transducers. Computational Linguistics, 30(2):205–225, 2004.

[12] Sarath Chandar A P, Stanislas Lauly, Hugo Larochelle, Mitesh Khapra, Balaraman Ravindran, Vikas C Raykar, and Amrita Saha. An autoencoder approach to learning bilingual word representations. In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing

Systems 27, pages 1853–1861. Curran Associates, Inc., 2014.

[13] Colin Cherry and George Foster. Batch tuning strategies for statistical machine trans- lation. In Proceedings of the 2012 Conference of the North American Chapter of the

Association for Computational Linguistics: Human Language Technologies, pages 427–

436. Association for Computational Linguistics, 2012.

[14] Colin Cherry and Dekang Lin. Soft syntactic constraints for word alignment through discriminative training. In Proceedings of the COLING/ACL 2006 Main Conference

Poster Sessions, pages 105–112, Sydney, Australia, July 2006. Association for Compu-

tational Linguistics.

[15] Jonathan H Clark, Chris Dyer, Alon Lavie, and Noah A Smith. Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. In Pro-

ceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, pages 176–181. Association for

Computational Linguistics, 2011.

[16] Ronan Collobert and Jason Weston. A unified architecture for natural language pro- cessing: Deep neural networks with multitask learning. In Proceedings of the 25th

International Conference on Machine Learning, ICML ’08, pages 160–167, New York,

NY, USA, 2008. ACM.

[17] Josep M Crego and François Yvon. Factored bilingual n-gram language models for statistical machine translation. Machine Translation, 24(2):159–175, 2010.

[18] Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. JOURNAL OF THE AMER-

[19] Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B

(methodological), pages 1–38, 1977.

[20] Jacob Devlin, Rabih Zbib, Zhongqiang Huang, Thomas Lamar, Richard M Schwartz, and John Makhoul. Fast and robust neural network joint models for statistical machine translation. In ACL (1), pages 1370–1380. Citeseer, 2014.

[21] Paramveer S. Dhillon, Dean Foster, and Lyle Ungar. Multi-view learning of word embeddings via cca. In Advances in Neural Information Processing Systems (NIPS), volume 24, 2011.

[22] Nadir Durrani, Philipp Koehn, Helmut Schmid, and Alexander M Fraser. Investigating the usefulness of generalized word representations in smt. In COLING, pages 421–432, 2014.

[23] Nadir Durrani, Helmut Schmid, and Alexander Fraser. A joint sequence translation model with integrated reordering. In Proceedings of the 49th Annual Meeting of the

Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT ’11, pages 1045–1054, Stroudsburg, PA, USA, 2011. Association for Computa-

tional Linguistics.

[24] Manaal Faruqui and Chris Dyer. Improving vector space word representations using multilingual correlation. In Proceedings of EACL, volume 2014, 2014.

[25] Yang Feng, Trevor Cohn, and Xinkai Du. Factored markov translation with robust modeling. In CoNLL, pages 151–159, 2014.

[26] Irving J Good. The population frequencies of species and the estimation of population parameters. Biometrika, 40(3-4):237–264, 1953.

[27] Stephan Gouws and Anders Søgaard. Simple task-specific bilingual word embeddings. In Proceedings of the 2015 Conference of the North American Chapter of the Associa-

tion for Computational Linguistics: Human Language Technologies, pages 1386–1390,

Denver, Colorado, May–June 2015. Association for Computational Linguistics.

[28] Zellig S Harris. Distributional structure. Word, 1954.

[29] Saša Hasan, Juri Ganitkevitch, Hermann Ney, and Jesús Andrés-Ferrer. Triplet lexicon models for statistical machine translation. In Proceedings of the Conference on Empiri-

cal Methods in Natural Language Processing, EMNLP ’08, pages 372–381, Stroudsburg,

PA, USA, 2008. Association for Computational Linguistics.

[30] Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H. Clark, and Philipp Koehn. Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual

Meeting of the Association for Computational Linguistics, pages 690–696, Sofia, Bul-

garia, August 2013.

[31] Karl Moritz Hermann and Phil Blunsom. Multilingual Distributed Representations without Word Alignment. In Proceedings of ICLR, April 2014.

[32] Mark Hopkins and Jonathan May. Tuning as ranking. In Proceedings of the Conference

on Empirical Methods in Natural Language Processing, pages 1352–1362. Association

for Computational Linguistics, 2011.

[33] Mark Hopkins and Jonathan May. Tuning as ranking. In Proceedings of the 2011

Conference on Empirical Methods in Natural Language Processing, pages 1352–1362,

Edinburgh, Scotland, UK., July 2011. Association for Computational Linguistics.

[34] Kurt Hornik, Ingo Feinerer, Martin Kober, and Christian Buchta. Spherical k-means clustering. Journal of Statistical Software, 50(10):1–22, 2012.

[35] Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. Im- proving word representations via global context and multiple word prototypes. In

Proceedings of the 50th Annual Meeting of the Association for Computational Linguis- tics: Long Papers - Volume 1, ACL ’12, pages 873–882, Stroudsburg, PA, USA, 2012.

Association for Computational Linguistics.

[36] Fei Huang and Alexander Yates. Distributional representations for handling sparsity in supervised sequence-labeling. In Proceedings of the Joint Conference of the 47th

Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1, ACL ’09, pages 495–503,

Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.

[37] Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. Inducing crosslingual dis- tributed representations of words. In Proceedings of the International Conference on

Computational Linguistics (COLING), Bombay, India, December 2012.

[38] R. Kneser and H. Ney. Improved backing-off for m-gram language modeling. In Acous-

tics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, volume 1, pages 181–184 vol.1, May 1995.

[39] Kevin Knight. Decoding complexity in word-replacement translation models. Compu-

tational Linguistics, 25(4):607–615, 1999.

[40] Philipp Koehn. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Machine translation: From real users to research, pages 115–124. Springer, 2004.

[41] Philipp Koehn. Statistical machine translation. Cambridge University Press, 2009.

[42] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. Moses: Open source toolkit for statistical machine translation. In Proceedings of the

45th annual meeting of the ACL on interactive poster and demonstration sessions,

pages 177–180. Association for Computational Linguistics, 2007.

[43] Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based trans- lation. In Proceedings of the 2003 Conference of the North American Chapter of the

Association for Computational Linguistics on Human Language Technology - Volume 1,

NAACL ’03, pages 48–54, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics.

[44] Adam Lopez. Statistical machine translation. ACM Comput. Surv., 40(3):8:1–8:49, August 2008.

[45] Wolfgang Macherey, Franz Josef Och, Ignacio Thayer, and Jakob Uszkoreit. Lattice- based minimum error rate training for statistical machine translation. In Proceedings of

the Conference on Empirical Methods in Natural Language Processing, pages 725–734.

Association for Computational Linguistics, 2008.

[46] José B Marino, Rafael E Banchs, Josep M Crego, Adria de Gispert, Patrik Lambert, José AR Fonollosa, and Marta R Costa-Jussà. N-gram-based machine translation.

Computational Linguistics, 32(4):527–549, 2006.

[47] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.

[48] Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. Exploiting similarities among lan- guages for machine translation. CoRR, abs/1309.4168, 2013.

[49] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in

Neural Information Processing Systems 26, pages 3111–3119. Curran Associates, Inc.,

2013.

[50] Robert C. Moore, Wen-tau Yih, and Andreas Bode. Improved discriminative bilingual word alignment. In Proceedings of the 21st International Conference on Computational

Linguistics and 44th Annual Meeting of the Association for Computational Linguistics,

pages 513–520, Sydney, Australia, July 2006. Association for Computational Linguis- tics.

[51] Jan Niehues, Teresa Herrmann, Stephan Vogel, and Alex Waibel. Wider context by using bilingual language models in machine translation. In Proceedings of the Sixth

Workshop on Statistical Machine Translation, pages 198–206. Association for Compu-

tational Linguistics, 2011.

[52] Franz J Och. Maximum-likelihood-schätzung von wortkategorien mit verfahren der kombinatorischen optimierung. Studienarbeit, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Germany, 1995.

[53] Franz Josef Och and Hermann Ney. A systematic comparison of various statistical alignment models. Comput. Linguist., 29(1):19–51, March 2003.

[54] Franz Josef Och, Christoph Tillmann, Hermann Ney, et al. Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical

Methods in Natural Language Processing and Very Large Corpora, pages 20–28, 1999.

[55] Franz Josef Och, Nicola Ueffing, and Hermann Ney. An efficient a* search algorithm for statistical machine translation. In Proceedings of the workshop on Data-driven

methods in machine translation-Volume 14, pages 1–8. Association for Computational

[56] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual

meeting on association for computational linguistics, pages 311–318. Association for

Computational Linguistics, 2002.

[57] Fernando Pereira, Naftali Tishby, and Lillian Lee. Distributional clustering of English words. In Proceedings of the ACL, pages 183–190, 1993.

[58] Radim Řehůřek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP

Frameworks, pages 45–50, Valletta, Malta, May 2010. ELRA.

[59] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.

[60] J.W. Sammon. A nonlinear mapping for data structure analysis. IEEE Transactions

on Computers, 18(5):401–409, 1969.

[61] Darlene Stewart, Roland Kuhn, Eric Joanis, and George Foster. Coarse “split and lump”bilingual language models for richer source information in smt. In Proceedings

of the eleventh Conference of the Association for Machine Translation in the Americas,

pages 28–41, 2014.

[62] Karl Stratos, Do-kyum Kim, Michael Collins, and Daniel Hsu. A spectral algorithm for learning class-based n-gram models of natural language. Proceedings of the Association

for Uncertainty in Artificial Intelligence, 2014.

[63] Ben Taskar, Simon Lacoste-Julien, and Dan Klein. A discriminative matching ap- proach to word alignment. In Proceedings of the Conference on Human Language

Technology and Empirical Methods in Natural Language Processing, HLT ’05, pages

73–80, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics.

[64] Joshua B Tenenbaum, Vin De Silva, and John C Langford. A global geometric frame- work for nonlinear dimensionality reduction. science, 290(5500):2319–2323, 2000.

[65] Warren S Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17(4):401–419, 1952.

[66] Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. Feature- rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the

2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 173–180. Association for

Computational Linguistics, 2003.

[67] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal

of Machine Learning Research, 9(2579-2605):85, 2008.

[68] Ashish Vaswani, Yinggong Zhao, Victoria Fossum, and David Chiang. Decoding with large-scale neural language models improves translation. In EMNLP, pages 1387–1392. Citeseer, 2013.

[69] I. H. Witten and T. C. Bell. The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4):1085–1094, Jul 1991.

[70] Hui Zhang, Kristina Toutanova, Chris Quirk, and Jianfeng Gao. Beyond left-to-right: Multiple decomposition structures for smt. In HLT-NAACL, pages 12–21, 2013.

[71] Bing Zhao and Eric P. Xing. Bitam: Bilingual topic admixture models for word alignment. In In Proceedings of the 44th Annual Meeting of the Association for Com-

putational Linguistics (ACL’06, 2006.

[72] Bing Zhao, Eric P. Xing, and Alex Waibel. Bilingual word spectral clustering for statis- tical machine translation. In Proceedings of the ACL Workshop on Building and Using

Parallel Texts, ParaText ’05, pages 25–32, Stroudsburg, PA, USA, 2005. Association

for Computational Linguistics.

[73] Will Y. Zou, Richard Socher, Daniel Cer, and Christopher D. Manning. Bilingual word embeddings for phrase-based machine translation. In Proceedings of the 2013

Conference on Empirical Methods in Natural Language Processing (EMNLP 2013),

In document Informe sobre el examen del Perú (página 34-39)