Search
2012 Volume 27
Article Contents
RESEARCH ARTICLE   Open Access    

An overview of the phrase-based statistical machine translation techniques

More Information
  • Abstract: This work provides a general overview of the statistical machine translation (SMT) scientific field, which is a subfield of machine translation (MT). Specifically, this paper focuses on one of the most popular SMT approaches, that is, the phrase-based system.The phrase-based translation units are typically extracted using statistical criteria, and they are weighted using different models. These models are log-linearly combined in the decoding, which is in charge of choosing the most probable translation. Significant quality improvements have been produced from original phrase-based SMT systems. Among others, the main challenges are reordering, domain adaptation and evaluation.
  • 加载中
  • Arnold D., Balkan L.1995. Machine translation: an introductory guide. Computational Linguistics210(4), 577–578.

    Google Scholar

    Axelrod A. E.2006. Factored Language Models for Statistical Machine Translation. Master Thesis, University of Edinburgh.

    Google Scholar

    Bangalore S., Bordel G., Riccardi G.2001. Computing consensus translation from multiple machine translation systems, In IEEE Workshop on Automatic Speech Recognition and Understanding, Madonna di Campiglio, Italy, 351–354.

    Google Scholar

    Barrachina S., Bender O., Casacuberta F., Civera J., Cubel E., Khadivi S., Lagarda A., Ney H., Toms J., Vidal E.2009. Statistical approaches to computer-assisted translation. Computational Linguistics350(1), 3–28.

    Google Scholar

    Berger A., Della Pietra S., Della Pietra V.1996. A maximum entropy approach to natural language processing. Computational Linguistics220(1), 39–72.

    Google Scholar

    Bertoldi N.2006. Minimum Error Training (Updates). Technical report, Slides of the JHU Summer Workshop.

    Google Scholar

    Bertoldi N., Federico M.2009. Domain adaptation for statistical machine translation with monolingual resources. In Proceedings of the 4th Workshop on Statistical Machine Translation, Athens, Greece, 182–189. Association for Computational Linguistics. http://www.aclweb.org/anthology/W/W09/W09-0432.

    Google Scholar

    Bertoldi N., Cattoni R., Cettolo M., Chen B., Federico M.2006. ITC-irst at the 2006 TC-STAR SLT evaluation campaign. In TC-STAR Workshop on Speech-to-Speech Translation, Barcelona, Spain, 19–24.

    Google Scholar

    Brown P., Della Pietra S., Della Pietra V., Mercer R.1993. The mathematics of statistical machine translation. Computational Linguistics190(2), 263–311.

    Google Scholar

    Bulyko I., Matsourkas S., Schwartz R., Nguyen L., Makhoul J.2007. Language model adaptation in machine translation from speech. In Proceedings of the 32nd International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, Hawai'i, 117–120.

    Google Scholar

    Callison-Burch C., Talbot D., Osborne M.2004. Statistical machine translation with word- and sentence-aligned parallel corpora. In Proceedings of the 42th Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 175–182.

    Google Scholar

    Callison-Burch C., Fordyce C., Koehn P., Monz C., Schroeder J.2007. (Meta-)evaluation of machine translation. In Proceedings of the 2nd Workshop on Statistical Machine Translation, Prague, Czech Republic. Association for Computational Linguistics, 136–158. http://www.aclweb.org/anthology/W/W07/W07-0218.

    Google Scholar

    Callison-Burch C., Koehn P., Monz C., Schroeder J.2008. Further meta-evaluation of machine translation. In Proceedings of the 3rd Workshop on Statistical Machine Translation, Columbus, OH. Association for Computational Linguistics, 70–106. http://www.aclweb.org/anthology/W/W08/W08-0309.

    Google Scholar

    Callison-Burch C., Koehn P., Monz C., Schroeder J.2009. Findings of the 2009 Workshop on Statistical Machine Translation. In Proceedings of the 4th Workshop on Statistical Machine Translation, Athens, Greece. Association for Computational Linguistics, 1–28. http://www.aclweb.org/anthology/W/W09/W09-0x01.

    Google Scholar

    Carpuat M., Wu D.2007. Improving statistical machine translation using word sense disambiguation. In Empirical Methods in Natural Language Processing (EMNLP), Prague, 61–72.

    Google Scholar

    Chen S. F., Goodman J. T.1998. An Empirical Study of Smoothing Techniques for Language Modeling. Technical report, Harvard University.

    Google Scholar

    Chiang D.2007. Hierarchical phrase-based translation. Computational Linguistics33(2), 201–228.

    Google Scholar

    Civera J., Juan A.2007. Domain adaptation in statistical machine translation with mixture modelling. In Proceedings of the 2nd Workshop on Statistical Machine Translation, Prague, Czech Republic, 177–180.

    Google Scholar

    Collins M., Koehn P., Kucerová I.2005. Clause restructuring for statistical machine translation. In Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics, Michigan, 531–540.

    Google Scholar

    Costa-jussà M. R., Fonollosa J. A. R.2009a.An Ngram-based reordering model. Computer Speech & Language230(3), 362–375.

    Google Scholar

    Costa-jussà M. R., Fonollosa J. A. R.2009b.State-of-the-art word reordering approaches in statistical machine translation. IEICE Transactions on Information and Systems920(11), 2179–2185.

    Google Scholar

    Costa-jussà M. R., Crego J. M., de Gispert A., Lambert P., Khalilov M., Mariño J. B., Fonollosa J. A. R., Banchs R.2006. TALP phrase-based statistical translation system for European language pairs. In Human Language Technology Conference (HLT-NAACL'06): Proceedings of the Workshop on Statistical Machine Translation, New York City, 142–145.

    Google Scholar

    Costa-jussà M. R., Fonollosa J. A. R., Monte E.2011. Recursive alignment block classification technique for word reordering in statistical machine translation. Language Resources and Evaluation Journal450(2), 165–179.

    Google Scholar

    Crego J. M.2008. Architecture and Modeling for N-gram-based Statistical Machine Translation. PhD thesis, Department of Signal Theory and Communications, Universitat Politècnica de Catalunya (UPC).

    Google Scholar

    Crego J. M., Mariño J. B.2007. Improving SMT by coupling reordering and decoding. Machine Translation200(3), 199–215.

    Google Scholar

    Crego J. M., Yvon F.2009. Gappy translation units under left-to-right SMT decoding. In Proceedings of the 13th Annual Conference of the European Association for Machine Translation (EAMT'09), Barcelona.

    Google Scholar

    Crego J. M., de Gispert A., Lambert P., Costa-jussà M. R., Khalilov M., Banchs R., Mariño J. B., Fonollosa J. A. R.2006. N-gram-based SMT system enhanced with reordering patterns. In Human Language Technology Conference (HLT-NAACL'06): Proceedings of the Workshop on Statistical Machine Translation, New York City, 162–165.

    Google Scholar

    de Gispert A., Marino J. B.2008. On the impact of morphology in English to Spanish statistical MT. Speech Communication50, 1034–1046.

    Google Scholar

    DeNero J., Chiang D., Knight K.2009. Fast consensus decoding over translation forests. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 567–575.

    Google Scholar

    Doddington G.2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the Human Language Technology Conference, HLT-NAACL'02, San Diego, 138–145.

    Google Scholar

    Doi T., Hwang Y., Imamura K., Okuma H., Sumita E.2005. Nobody is perfect: ATR's hybrid approach to spoken language translation. In Proceedings of the International Workshop on Spoken Language Translation, IWSLT'04, Pittsburgh, PA, USA, 55–62.

    Google Scholar

    Dorr B. J.1994. Machine translation: a view from the lexicon. Computational Linguistics200(4), 670–676.

    Google Scholar

    Eck M., Vogel S., Waibel A.2004. Language model adaptation for statistical machine translation based on information retrieval. In Proceedings of the LREC, Lisbon, Portugal, 327–330.

    Google Scholar

    Farrús M., Costa-jussà M. R., Mariño J. B., Fonollosa J. A. R.2010. Linguistic-based evaluation criteria to identify statistical machine translation errors. In Proceedings of the 14th Annual Meeting of the EAMT: European Association for Machine Translation, Saint Rapahel.

    Google Scholar

    Finch A., Sumita E.2008. Dynamic model interpolation for statistical machine translation. In Proceedings of the 3rd Workshop on Statistical Machine Translation, Colombus, USA, 208–215.

    Google Scholar

    Fiscus G.1997. A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In IEEE Workshop on Automatic Speech Recognition and Understanding, Santa Barbara, USA.

    Google Scholar

    Flanagan M. A.1994. Error classification for MT evaluation. In Proceedings of the AMTA, Columbia, 65–72.

    Google Scholar

    Foster G., Kuhn R.2007. Mixture-model adaptation for SMT. In Proceedings of the 2nd Workshop on Statistical Machine Translation, Prague, Czech Republic, 128–135.

    Google Scholar

    Fraser A., Marcu D.2006. Measuring Word Alignment Quality for Statistical Machine Translation. Technical report, ISI/University of Southern California, California.

    Google Scholar

    Frederking R., Nirenburg S.1994. Three heads are better than one. In Proceedings of the 4th Conference on Applied Natural Language Processing, Stuttgart, Germany.

    Google Scholar

    Giménez J., Màrquez L.2007. Linguistic features for automatic evaluation of heterogenous MT systems. In Proceedings of the 2nd Workshop on Statistical Machine Translation, Prague, 256–264.

    Google Scholar

    Haque R., Kumar Naskar S., Ma Y., Way A.2009. Using supertags as source language context in SMT. In Proceedings of the 13th Annual Conference of the European Association for Machine Translation (EAMT), Barcelona, 234–241.

    Google Scholar

    Hasan S., Ney H.2005. Clustered language models based on regular expressions for statistical machine translation. In Proceedings of the 10th Annual Conference of The European Association for Machine Translation (EAMT), Budapest, Hungary, 119–125.

    Google Scholar

    Hassan H., Hearne M., Way A., Sima'an K.2006. Syntactic phrase-based statistical machine translation. In Proceedings of the 1st IEEE/ACL Workshop on Spoken Language Technology, Aruba.

    Google Scholar

    Jayaraman S., Lavie A.2005. Multi-engline machine translation guided by explicit word matching. In Proceedings of the 10th Conference of the European Association for Machine Translation, Budapest, Hungary, 143–152.

    Google Scholar

    Kanthak S., Vilar D., Matusov E., Zens R., Ney H.2005. Novel reordering approaches in phrase-based statistical machine translation. In Annual Meeting of the Association for Computational Linguistics: Proceedings of the ACL Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond (WMT), Ann Arbor, MI, 167–174.

    Google Scholar

    Khalilov M., Costa-jussà M. R., Henríquez C. A., Fonollosa J. A. R., Hernández A., Mariño J. B., Banchs R. E., Chen B., Zhang M., Aw A., Li H.2008. The TALP & I2R SMT systems for IWSLT 2008. In Proceedings of the International Workshop on Spoken Language Translation, Hawaii, USA, 116–123.

    Google Scholar

    Khalilov M., Fonollosa J. A. R., Dras M.2009. A new subtree-transfer approach to syntax-based reordering for statistical machine translation. In Proceedings of the 13th Annual Conference of the European Association for Machine Translation (EAMT'09), Barcelona, Spain, 198–204.

    Google Scholar

    Knight K., Al-Onaizan Y.1998. Translation with finite-state devices. In Proceedings of the 4th Conference of the Association for Machine Translation in the Americas, AMTA'02, Langhorne, 421–437.

    Google Scholar

    Koehn K., Knight K.2003. Empirical methods for compound splitting. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, Budapest, Hungary, 347–354.

    Google Scholar

    Koehn P., Schroeder J.2007. Experiments in domain adaptation for statistical machine translation. In Annual Meeting of the Association for Computational Linguistics: Proceedings of the 2nd Workshop on Statistical Machine Translation (WMT), Prague, 224–227.

    Google Scholar

    Koehn P., Och F. J., Marcu D.2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference, HLT-NAACL'03, Edmonton, Canada, 48–54.

    Google Scholar

    Koehn P., Amittai A., Birch A., Callison-Burch C., Osborne M., Talbot D., White M.2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proceedings of International Workshop on Spoken Languages Translation, Pittsburgh.

    Google Scholar

    Koehn P., Hoang H., Birch A., Callison-Burch C., Federico M., Bertoldi N., Cowan B., Shen W., Moran C., Zens R., Dyer C., Bojar O., Constantin A., Herbst E.2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, 177–180.

    Google Scholar

    Kumar S., Byrne W.2004. Minimum Bayes-risk decoding for statistical machine translation. In Proceeding of the Human Language Technology Conference, HLT-NAACL'04, Boston, MA, USA, 169–176.

    Google Scholar

    Kumar S., Macherey W., Dyer C., Och F.2009. Efficient minimum error rate training and minimum Bayes-risk decoding for translation hypergraphs and lattices. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 163–171.

    Google Scholar

    Lambert P.2008. Exploiting Lexical Information and Discriminative Alignment Training in Statistical Machine Translation. PhD thesis, Software Department, Universitat Politècnica de Catalunya (UPC).

    Google Scholar

    Lambert P., Banchs R. E.2006. Tuning machine translation parameters with SPSA. In Proceedings of the International Workshop on Spoken Language Translation, Kyoto, Japan, 190–196.

    Google Scholar

    Langlais P., Gotti F.2006. Phrase-based SMT with shallow tree-phrases. In Proceedings of the Workshop on Statistical Machine Translation, New York, USA, 39–46.

    Google Scholar

    Langlais P., Patry A.2007. Translating unknown words by analogical learning. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, 877–886. http://www.aclweb.org/anthology/D/D07/D07-1092.

    Google Scholar

    Lavie A., Agarwal A.2007. METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In Annual Meeting of the Association for Computational Linguistics: Proceedings of the 2nd Workshop on Statistical Machine Translation (WMT), Prague, Czech Republic, 228–231.

    Google Scholar

    Lopez A.2007. A Survey of Statistical Machine Translation. Storming Media.

    Google Scholar

    Lopez A.2008. Machine Translation by Pattern Matching. PhD thesis, University of Maryland.

    Google Scholar

    Macherey W., Och F., Thayer I., Uszkoreit J.2008. Lattice-based minimum error rate training for statistical machine translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Hawaii, 725–734.

    Google Scholar

    Mariño J. B., Banchs R. E., Crego J. M., de Gispert A., Lambert P., Fonollosa J. A. R., Costa-jussà M. R.2006. N-gram based machine translation. Computational Linguistics320(4), 527–549.

    Google Scholar

    Matusov E., Zens R., Vilar D., Mauser A., Popovic M., Hasan S., Ney H.2006. The RWTH machine translation system. In TC-STAR Workshop on Speech-to-Speech Translation, Barcelona, Spain, 31–36.

    Google Scholar

    Matusov E., Leusch G., Banchs R. E., Bertoldi N., Dechelotte D., Federico M., Kolss M., Lee Y., Marino J. B., Paulik M., Roukos S., Schwenk H., Ney H.2008. System combination for machine translation of spoken and written language. IEEE Transactions on Audio, Speech and Language Processing160(7), 1222–1237.

    Google Scholar

    Mauser A., Matusov E., Ney H.2006. Training a statistical machine translation system without GIZA++. In Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC'06, Genova, 715–720.

    Google Scholar

    McCowan I., Moore D., Dines J., Gatica-Perez D., Flynn M., Wellner P., Bourlard H.2004. On the use of information retrieval measures for speech recognition evaluation. In Proceedings of the IDIAP-RR 73, Martigny, Switzerland. IDIAP.

    Google Scholar

    Menezes A., Toutanova K., Quirk C.2006. Microsoft research treelet translation system: NAACL 2006 Europarl evaluation. In Proceedings on the Workshop on Statistical Machine Translation, New York City. Association for Computational Linguistics, 158–161.

    Google Scholar

    Nelder J. A., Mead R.1965. A simplex method for function minimization. The Computer Journal7, 308–313.

    Google Scholar

    Nießen S., Ney H.2001. Morpho-syntactic analysis for reordering in statistical machine translation. In Proceedings of the MT-Summit VII, Santiago de Compostela, Spain, 247–252.

    Google Scholar

    Nomoto T.2004. Multi-engine machine translation with voted language model. In Proceedings of the 42th Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 494–501.

    Google Scholar

    Och F. J.1999. An efficient method for determining bilingual word classes. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics, Bergen, Norway, 71–76.

    Google Scholar

    Och F. J.2002. Statistical Machine Translation: From Single-Word Models to Alignment Templates. PhD thesis, RWTH Aachen University, Aachen, Germany.

    Google Scholar

    Och F. J.2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics, Sapporo, 160–167.

    Google Scholar

    Och F. J., Ney H.2000. A comparison of alignment models for statistical machine translation. In Proceedings of the 18th Conference on Computational Linguistics, Morristown, NJ, USA, 1086–1090.

    Google Scholar

    Och F. J., Ney H.2004. The alignment template approach to statistical machine translation. Computational Linguistics300(4), 417–449.

    Google Scholar

    Och F.-J., Gildea D., Khudanpur S., Sarkar A., Yamada K., Fraser A., Kumar S., Shen L., Smith D., Eng K., Jain V., Jin Z., Radev D.2004. A smorgasbord of features for statistical machine translation. In Proceedings of the Human Language Technology Conference, HLT-NAACL'04, 161–168.

    Google Scholar

    Olive J.2005. Global autonomous language exploitation. DARPA/IPTOProposer Information Pamphlet.

    Google Scholar

    Papineni K., Roukos S., Ward T., Zhu W-J.2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, 311–318.

    Google Scholar

    Popovic M.2009. Machine Translation:Statistical Approach with Additional Linguistic Knowledge. PhD thesis, RWTH University.

    Google Scholar

    Popovic M., Ney H.2006. POS-based word reorderings for statistical machine translation. In Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC'06, Genoa, Genoa, Italy, 1278–1283.

    Google Scholar

    Popovic M., Ney H.2009. Syntax-oriented evaluation measures for machine translation output. In Proceedings of the 4th Workshop on Statistical Machine Translation, Athens, 29–32.

    Google Scholar

    Przybocki M., Sanders G., Le A.2006. Edit distance: a metric for machine translation evaluation. In Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC'06, Genoa, Italy, 2038–2043.

    Google Scholar

    Quirk C., Menezes A., Cherry C.2005. Dependency treelet translation: syntactically informed phrasal SMT. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), Ann Arbor, MI, 271–279.

    Google Scholar

    Rogati M.2009. Domain Adaptation of Translation Models for Multilingual Applications. PhD thesis, Carnegie Mellon University.

    Google Scholar

    Rosti A.-V.I., Ayan N. F., Xiang S. B., Schwartz Matsoukas R., Dorr B. J.2007. Combining outputs from multiple machine translation systems. In Proceedings of the Human Language Technology Conference, HLT-NAACL'07, Rocherster, USA, 228–235.

    Google Scholar

    Schwenk H., Estève Y.2008. Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation. In Proceedings of the Interspeech'08, Brisbane, Australia.

    Google Scholar

    Schwenk H., Costa-jussà M. R., Fonollosa J. A. R.2006. Continuous Space Language Models for the IWSLT 2006 Task. In Proceedings of the International Workshop on Spoken Language Translation, Kyoto, Japan, 166–173.

    Google Scholar

    Schwenk H., Costa-jussa M. R., Fonollosa J. A. R.2007. Smooth bilingual n-gram translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, 430–438. http://www.aclweb.org/anthology/D/D07/D07-1045.

    Google Scholar

    Sim K. C., Byrne W. J., Gales M. J. F., Sahbi H., Woodland P. C.2007. Consensus network decoding for statistical machine translation system combination. In Proceedings of the ICASSP, 4, Rocherster, USA, 105–108.

    Google Scholar

    Snover M., Dorr B., Schwartz R., Micciulla L., Makhoul J.2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas, Sydney, Australia.

    Google Scholar

    Snover M., Dorr B., Schwartz R., Makhoul J., Micciula L., Weischedel R.2005. A Study of Translation Error Rate with Targeted Human Annotation. Technical report LAMP-TR-126,CS-TR-4755,UMIACS-TR-2005-58, University of Maryland, College Park and BBN Technologies.

    Google Scholar

    Stroppa N., van de Bosch A., Way A.2007. Exploiting source similarity for SMT using context-informed features. In Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine TRanslation (TMI), Skövde, 231–240.

    Google Scholar

    Tillmann C.2004. A unigram orientation model for statistical machine translation. In Proceedings of the Human Language Technology Conference, HLT-NAACL'04, Boston, 101–104.

    Google Scholar

    Tillmann C., Ney H.2003. Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Computational Linguistics290(1), 97–133.

    Google Scholar

    Vilar D., Stein D., Zhang Y., Matusov E., Mauser A., Bender O., Mansour S., Ney H.2008. The RWTH machine translation system for IWSLT 2008. In Proceedings of the International Workshop on Spoken Language Translation, Waikiki, Hawaii, 108–115.

    Google Scholar

    Vilar D., Xu J., Fernando-D'Haro L., Ney H.2006. Error analysis of statistical machine translation output. In Proceedings of the LREC, Genoa, Italy.

    Google Scholar

    Wang C., Collins M., Koehn P.2007. Chinese syntactic reordering for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, 737–745.

    Google Scholar

    Wu D.1996. A polynomial-time algorithm for statistical machine translation. In Annual Meeting of the Association for Computational Linguistics, Santa Cruz.

    Google Scholar

    Wu H., Wang H., Zong C.2008. Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In Proceedings of the 22nd International Conference on Computational Linguistics, Beijing, China, 1, 993–1000.

    Google Scholar

    Xia F., McCord M.2004. Improving a statistical mt system with automatically learned rewrite patterns. In Proceedings of the 20th International Conference on Computational Linguistics, Morristown, 508.

    Google Scholar

    Yamada K., Knight K.2002. A decoder for syntax-based statistical MT. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 303–310.

    Google Scholar

    Zens R., Och F. J., Ney H.2002. Phrase-based statistical machine translation. In KI-2002: Advances in Artificial Intelligence, Jarke, M., Koehler, J. & Lakemeyer, G. (eds), Lecture Notes in Artificial Intelligence 2479, 18–32. Springer Verlag.

    Google Scholar

    Zhang Y., Zens R., Ney H.2007. Chunk-level reordering of source language sentences with automatically learned rules for statistical machine translation. In Proceedings of the Human Language Technology Conference (HLT-NAACL'06): Proceedings of the Workshop on Syntax and Structure in Statistical Translation (SSST), Rochester, 1–8.

    Google Scholar

  • Cite this article

    Marta Ruiz Costa-jussà. 2012. An overview of the phrase-based statistical machine translation techniques. The Knowledge Engineering Review 27(4)413−431, doi: 10.1017/S026988891200029X
    Marta Ruiz Costa-jussà. 2012. An overview of the phrase-based statistical machine translation techniques. The Knowledge Engineering Review 27(4)413−431, doi: 10.1017/S026988891200029X

Article Metrics

Article views(17) PDF downloads(288)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

An overview of the phrase-based statistical machine translation techniques

The Knowledge Engineering Review  27 2012, 27(4): 413−431  |  Cite this article

Abstract: Abstract: This work provides a general overview of the statistical machine translation (SMT) scientific field, which is a subfield of machine translation (MT). Specifically, this paper focuses on one of the most popular SMT approaches, that is, the phrase-based system.The phrase-based translation units are typically extracted using statistical criteria, and they are weighted using different models. These models are log-linearly combined in the decoding, which is in charge of choosing the most probable translation. Significant quality improvements have been produced from original phrase-based SMT systems. Among others, the main challenges are reordering, domain adaptation and evaluation.

    • The author would like to thank her PhD colleagues Patrik Lambert and Josep Maria Crego for their support and Simon Parsons for motivating this work. The author also wants to thank Barcelona Media Innovation Centre for its permission to publish this paper.

    • This work has been partially funded by the Spanish Ministry of Economy and Competitiveness through the Juan de la Cierva fellowship program.

    • http://www.nist.gov/speech/tests/mt/doc/mt06–evalplan.v4.pdf

    • http://www.aclweb.org/anthology/W/W06/W06-15

    • http://www.slc.atr.jp/IWSLT2006/

    • http://www.statmt.org/moses/

    • http://www.statmt.org/moses/

    • http://www.statmt.org/wmt[07-10]/

    • http://www.caitra.org

    • Copyright © Cambridge University Press 20122012Cambridge University Press
References (106)
  • About this article
    Cite this article
    Marta Ruiz Costa-jussà. 2012. An overview of the phrase-based statistical machine translation techniques. The Knowledge Engineering Review 27(4)413−431, doi: 10.1017/S026988891200029X
    Marta Ruiz Costa-jussà. 2012. An overview of the phrase-based statistical machine translation techniques. The Knowledge Engineering Review 27(4)413−431, doi: 10.1017/S026988891200029X
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return