Search
2015 Volume 30
Article Contents
RESEARCH ARTICLE   Open Access    

Domain adaptation strategies in statistical machine translation: a brief overview

More Information
  • Abstract: Statistical machine translation (SMT) is gaining interest given that it can easily be adapted to any pair of languages. One of the main challenges in SMT is domain adaptation because the performance in translation drops when testing conditions deviate from training conditions. Many research works are arising to face this challenge. Research is focused on trying to exploit all kinds of material, if available. This paper provides an overview of research, which copes with the domain adaptation challenge in SMT.
  • 加载中
  • Abekawa T. & Kageura K.2007. A translation aid system with a stratified lookup interface. In ACL. The Association for Computer Linguistics.

    Google Scholar

    Axelrod A., He X. & Gao J.2011. Domain adaptation via pseudo in-domain data selection. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ‘11), 355–362.

    Google Scholar

    Barrachina S., Bender O., Casacuberta F., Civera J., Cubel E., Khadivi S., Lagarda A., Ney H., Toms J. & Vidal E.2009. Statistical approaches to computer-assisted translation. Computational Linguistics35(1), 3–28.

    Google Scholar

    Bertoldi N. & Federico M.2009. Domain adaptation for statistical machine translation with monolingual resources. In Proceedings of the Fourth Workshop on Statistical Machine Translation, 182–189. Association for Computational Linguistics, March.

    Google Scholar

    Bulyko I., Matsourkas S., Schwartz R., Nguyen L. & Makhoul J.2007. Language model adaptation in machine translation from speech. In Proceedings of the 32nd International Conference on Acoustics, Speech and Signal Processing (ICASSP), 117–120.

    Google Scholar

    Carpuat M. & Wu D.2007. Improving statistical machine translation using word sense disambiguation. In Empirical Methods in Natural Language Processing (EMNLP), 61–72, June.

    Google Scholar

    Ceausu A., Tinsley J., Zhang J. & Way A.2011. Experiments on domain adaptation for patent machine translation in the PLuTO project. In Proceedings of the EAMT.

    Google Scholar

    Civera J. & Juan A.2007. Domain adaptation in statistical machine translation with mixture modelling. In Proceedings of the Second Workshop on Statistical Machine Translation (StatMT ‘07), 177–180.

    Google Scholar

    Costa-jussà M. R., Banchs R. E., Rapp R., Lambert P., Eberle K. & Babych B.2013. Workshop on hybrid approaches to translation: overview and developments. In Proceedings of the ACL Second Workshop on Hybrid Approaches to Translation (HyTra). Association for Computational Linguistics.

    Google Scholar

    Daum H.III & Jagarlamudi J.2011. Domain adaptation for machine translation by mining unseen words. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers – Volume 2 (HLT ‘11), 407–412.

    Google Scholar

    Eck M., Vogel S. & Waibel A.2004. Language model adaptation for statistical machine translation based on information retrieval. In Proceedings of the LREC, 327–330.

    Google Scholar

    España-Bonet C., Giménez J. & Màrquez L.2010. Discriminative phrase-based models for Arabic machine translation. ACM Transactions on Asian Language Information Processing Journal (TALIP), 8, 1–20, March.

    Google Scholar

    Esteban J., Lorenzo J., Valderrábanos A. S. & Lapalme G.2004. TransType 2 – an innovative computer-assisted translation system. In The Companion Volume to the Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics, 94–97.

    Google Scholar

    Farrús M., Costa-jussà M. R., Hernández A., Hneríquez C., Mariño J. B. & Fonollosa J. A. R.2009. On the enhancement of Catalan-Spanish Ngram-based translation by using human evaluation. Language Resources and Evaluation.

    Google Scholar

    Finch A. & Sumita E.2008. Dynamic model interpolation for statistical machine translation. In Proceedings of the Third Workshop on Statistical Machine Translation, 208–215.

    Google Scholar

    Formiga L., Costa-jussà M. R., Mariño J. B., Fonollosa J. A. R., Barrón-Cedeño A. & Màrquez L.2013. The TALP-UPC phrase-based translation systems for WMT13: system combination with morphology generation, domain adaptation and corpus filtering. In Proceedings of the Eighth Workshop on Statistical Machine Translation. Association for Computational Linguistics.

    Google Scholar

    Formiga L., Hernández A., Mariño J. B. & Monte E.2012. Improving English to Spanish out-of-domain translations by morphology generalization and generation. In Proceedings of the AMTA Monolingual Machine Translation-2012 Workshop.

    Google Scholar

    Foster G., Goutte C. & Kuhn R.2010. Discriminative instance weighting for domain adaptation in statistical machine translation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 451–459.

    Google Scholar

    Foster G. & Kuhn R.2007. Mixture-model adaptation for SMT. In Proceedings of the Second Workshop on Statistical Machine Translation, 128–135.

    Google Scholar

    Foster G., Kuhn R. & Johnson H.2006. Phrasetable smoothing for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 53–61.

    Google Scholar

    Haque R.2011. Integrating Source-Language Context into Log-linear Models of Statistical Machine Translation. PhD thesis, Dublin City University.

    Google Scholar

    Hardt D. & Elming J.2010. Incremental re-training for post-editing SMT. In Proceedings of the 9th Annual Conference of the Association for Machine Translation in the Americas.

    Google Scholar

    Henríquez C. A., Mariño J. B. & Banchs R. E.2011. Deriving translation units using small additional corpora. In Proceedings of the 15th Conference of the European Association for Machine Translation.

    Google Scholar

    Hildebrand A. S., Eck M., Vogel S. & Waibel A.2005. Adaptation of the translation model for statistical machine translation based on information retrieval. In Proceedings of EAMT, 133–142.

    Google Scholar

    Khalilov M., Costa-Jussà M. R., Henríquez C. A., Fonollosa J. A. R., Hernández A., Mariño J. B., Banchs R. E., Chen B., Zhang M., Aw A. & Li H.2008. The TALP & I2R SMT systems for IWSLT 2008. In Proceedings of the International Workshop on Spoken Language Translation, 116–123.

    Google Scholar

    Koehn P.2010. Statistical Machine Translation. Cambridge University Press.

    Google Scholar

    Koehn P. & Schroeder J.2007. Experiments in domain adaptation for statistical machine translation. In Annual Meeting of the Association for Computational Linguistics: Proceedings of the Second Workshop on Statistical Machine Translation (WMT), 224–227, June.

    Google Scholar

    Levenberg A., Callison-Burch C. & Osborne M.2010. Stream-based translation models for statistical machine translation. In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, 394–402.

    Google Scholar

    López A.2008. Statistical machine translation. ACM Computing Surveys40(3), 1–49.

    Google Scholar

    Marcu D.2001. Towards a unified approach to memory- and statistical-based machine translation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, 386–393.

    Google Scholar

    Moore R. C. & Lewis W.2010. Intelligent selection of language model training data. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics Short Papers, 220–224.

    Google Scholar

    Niehues J. & Waibel A.2010. Domain adaptation in statistical machine translation using factored translation models. In Proceedings of EAMT.

    Google Scholar

    Okuma H., Yamamoto H. & Sumita E.2008. Introducing a translation dictionary into phrase-based SMT system. IEICE TRANSACTIONS on Information and Systems, E91-D7, 2051–2057.

    Google Scholar

    Ortíz-Martnez D., García-Varea I. & Casacuberta F.2010. Online learning for interactive statistical machine translation. In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, 546–554.

    Google Scholar

    Pecina P., Toral A., Way A., Papavassiliou V., Prokopidis P. & Giagkou M.2011. Towards using web-crawled data for domain adaptation in statistical machine translation. In Proceedings of the EAMT.

    Google Scholar

    Rogati M.2009. Domain Adaptation of Translation Models for Multilingual Applications. PhD thesis, Carnegie Mellon University.

    Google Scholar

    Schwenk H., Costa-jussà M. R. & Fonollosa J. A. R.2007. Smooth bilingual translation. In Proceedings of the Empirical Methods in Natural Language Processing, 430–438.

    Google Scholar

    Schwenk H. & Estève Y.2008. Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation. In Proceedings of the Interspeech.

    Google Scholar

    Sennrich R.2012. Perplexity minimization for translation model domain adaptation in statistical machine translation. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 539–549.

    Google Scholar

    Skadiņa I., Aker A., Mastropavlos N., Su F., Tufiş D., Verlič M., Vasiļjevs A., Babych B., Clough P., Gaizauskas R., Glaros N., Paramita M.L., Pinnis M.2012. Collecting and Using Comparable Corpora for Statistical Machine Translation. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 21–27 May 2012, 438–445.

    Google Scholar

    Ueffing N., Haffari G. & Sarkar A.2008. Semi-supervised model adaptation for statistical machine translation. Machine Translation Journal.

    Google Scholar

    Wu H., Wang H. & Zong C.2008. Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, 993–1000.

    Google Scholar

    Zens R. & Ney H.2004. Improvements in phrase-based statistical machine translation. In Proceedings of the Human Language Technology Conference, 257–264.

    Google Scholar

  • Cite this article

    Marta R. Costa-Jussà. 2015. Domain adaptation strategies in statistical machine translation: a brief overview. The Knowledge Engineering Review 30(5)514−520, doi: 10.1017/S0269888915000119
    Marta R. Costa-Jussà. 2015. Domain adaptation strategies in statistical machine translation: a brief overview. The Knowledge Engineering Review 30(5)514−520, doi: 10.1017/S0269888915000119

Article Metrics

Article views(23) PDF downloads(43)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

Domain adaptation strategies in statistical machine translation: a brief overview

The Knowledge Engineering Review  30 2015, 30(5): 514−520  |  Cite this article

Abstract: Abstract: Statistical machine translation (SMT) is gaining interest given that it can easily be adapted to any pair of languages. One of the main challenges in SMT is domain adaptation because the performance in translation drops when testing conditions deviate from training conditions. Many research works are arising to face this challenge. Research is focused on trying to exploit all kinds of material, if available. This paper provides an overview of research, which copes with the domain adaptation challenge in SMT.

    • This work has been funded by the Seventh Framework Program of the European Commission through the International Outgoing Fellowship Marie Curie Action (IMTraP-2011-29951), the Spanish Ministerio de Economía y Competitividad, contract TEC2012-38939-C03-02 and the European Regional Development Fund (ERDF/FEDER).

    • http://www.statmt.org/wmt14

    • We refer to handling the problem when data is missing.

    • http://www.accurat-project.eu

    • http://translate.google.com/toolkit

    • http://www.casmacat.eu

    • http://www.caitra.org

    • © Cambridge University Press, 2015 2015Cambridge University Press
References (43)
  • About this article
    Cite this article
    Marta R. Costa-Jussà. 2015. Domain adaptation strategies in statistical machine translation: a brief overview. The Knowledge Engineering Review 30(5)514−520, doi: 10.1017/S0269888915000119
    Marta R. Costa-Jussà. 2015. Domain adaptation strategies in statistical machine translation: a brief overview. The Knowledge Engineering Review 30(5)514−520, doi: 10.1017/S0269888915000119
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return